The Next Generation of Managing Enterprise Data: Intelligent Data Identification
As featured on insideBigData.com
The paradigm of managing data at the enterprise level continues to evolve. In order to thrive in today’s market, businesses must demand more from their data – more insights, more agility and more flexibility. Self-serve trusted data is what users crave, and there are a few innovative vendors out there delivering on this objective. Within a single, integrated self-service environment, data producers and consumers throughout the business working together like never before to leverage the value of analytics and data-driven decision making. Along with the business, capabilities within a Podium data marketplace evolve to meet a greater number of challenges. This article, the first in a three-part series, focuses on the ability to generate more insights, faster with intelligent data identification.
Intelligent Data Identification
Organizations thriving in today’s big data driven environment leverage their ability to quickly harness key insights to achieve strategic business objectives. In fact, Forrester suggests “insights-driven businesses will steal $1.2 trillion in 2020*” from those unable to harness more, better data to support continuous development. To be part of this next generation, success goes beyond the effective management of enterprise data into the ability to effectively convert that data into impactful answers. Unlocking this capability requires greater cooperation between IT, data and business teams facilitating self-service access to trusted, business ready data. The common thread effectively connecting these cross-functional teams to become the desired insights-driven business lies within the metadata.
Businesses know that having good metadata is important, but it is less clear how using metadata beyond simple cataloging or classification adds value and delivers key insights driving the business forward. Leveraging a rich metadata repository provides benefits such as easily tracking data quality, maintaining regulatory compliance, and saving significant time by creating a common language across divisions to understand and utilize data in a uniform manner. The ability to “drill deeper” into the data via profiling and validation statistics can enrich one’s understanding even more. However, it’s not the metadata alone providing this benefit. Generating more useful insights necessitate use of enhanced profiling statistics.
Intelligent Data Identification capabilities go beyond the metadata repository to leverage 80+ profiling statistics for core data insights and delivering on key value-add business use cases. From discovering data type inconsistencies to detecting Personally Identifiable Information (PII), intelligent data identification delivers self-service convenience with faster, more valuable insights that analytics users crave.
Profile Statistics Make Metadata, Better Data
Metadata plays the fundamental role for the business in a data driven environment. It is used to manage data access rights, and store/enforce the policies that define production readiness; to name a few essential uses. Going beyond standard ingest, Podium also uses the source data itself to automatically analyze, summarize and create profile statistics on every field to enable global search and discovery. Furthermore, this enables real time detection, reporting and alerting of PII and duplicate data; only possible through an integrated metadata catalog.
By utilizing 80+ profile and validation statistics, Podium Data leverages insights bringing clarity to data like never before exposing instant business-ready insights:
- Data contents conflict with anticipated details
- The data is compromised missing a natural key, for example
- The data does not comply with the expected format or data types
- New revelations in seconds such as “40% of all credit card transactions were from swipes, 60% came from chip transactions”
Increase Business Agility with Pattern Recognition
Throughout the organization, specific use cases are identified and prioritized in terms of criticality based on aspects such as maintaining regulatory compliance or improving operational efficiency. A clear view of the data alone is not enough when it comes to faster insights for complex decisions. When combining pattern recognition and rules engines with insight based on detail-rich “better data”, actions are taken with trusted intelligence; cutting time and costs advancing true “Business Agility” for next generation enterprise data management.
Podium leverages pattern recognition to address a variety of use cases including Personally Identifiable Information (PII) detection as depicted below. Using standard algorithms such as the Luhn formula or any customized variations, any data can be directed to run against a rules engine to identify all the locations of potential PII. Added flexibility allows for users to choose what to do with the PII data – tag the relevant fields within its properties and/or automatically mask the data with encryption, tokenization or obfuscation techniques. This ensures both that the business understands where PII data resides and it is quickly made available for the rest of the business to use.
Another challenge businesses face is managing duplicate data. As a result of the traditional silo process where each group had its own copy of the data, upon populating the data lake, exact duplicate data is likely to result. Utilizing Podium’s rich profiling information, customers can automatically identify exact duplicate data with over 99.9% certainty and eliminate the headaches caused from working with duplicate data.
Becoming a true “insights-driven” business goes beyond managing the data. Metadata and other insights about the data can produce intelligence, faster and more accurate decisions as part of an enterprise’ daily practice – not something on a roadmap. Enhancing the metadata with rich profiling statics combined with pattern recognition identifies the critical business ready data faster without data experts. This facilitates the proficient use of data to solve industry specific challenges important to maintaining a competitive edge.
Stayed tuned for the second part of this series focused on combining insights with more agility - utilizing all data collected, no matter where it is generated or stored, to best serve enterprise needs.
*Forrester: The Insights-Driven Business, July 2016