The Next Generation of Managing Enterprise Data: Data Conductor
As featured on insideBigData.com
This article, the second in a three-part series, focuses on the need for increased data agility. Transforming data from “raw to ready” depends not just on deeper insights, but the ability to access and work with all data on any platform – not just Hadoop – within an enterprise. With capabilities to act as what we call a “data conductor”, businesses are free to work with any data at anytime, anywhere to best serve business goals, unlocking the constraints of any data management environment. “Data Conductor” is a Podium capability that allows you to control the levels of data management while retaining a rich and complete metadata catalog for the enterprise.
Enter Data Conductor
The basic needs of managing enterprise data are shifting within the business. With the increased importance, the increasing volume, and the demand to leverage this data, a new model is necessary. The data lake serves this purpose. Lakes are more than just the latest in an ongoing series of data and analytic reporting platforms such as databases, data warehouses, and analytic appliances. Instead, data lakes are part of the next generation enterprise-scale architecture for managing big data and delivering massively expanded analytics for today's modern enterprises.
One of the key challenges in data lake architecture is how to integrate all data in the enterprise to optimal affect. Even with clear benefits, the reality is it must thrive in the well-populated landscape of data management platforms, processes and standards. Should all or a subset of information be ingested in the lake, and why? What are the tradeoffs and considerations that would define the right answers to these questions?
Podium’s data conductor capabilities give organizations flexible options to ensure a smooth transition towards this new architecture without sacrificing efficiencies of day-day operations.
Why does Data Conductor Matter?
Faster time to insight – To achieve faster time to answers many organizations have chosen to deploy a data lake within the enterprise. However, data governance policies around certain data classifications and sets may require that data to remain stored outside of the lake. With data conductor, organizations can still have a holistic view of their data sources at their fingertips, whether or not the data itself resides in the lake. Setting entities at the addressed or registered level to acquire insights from the source metadata to full validation and profiling details. If further use of the data is necessary, promote it full-time or once, on-demand, for managed capabilities as shown below. Once any data is needed for archival purposes, simply demote the data back down to the desired level of management.
Simplify the data management process – Many organizations have already deployed a data lake, but are looking to better leverage its potential as a means to provide high operational value and deliver faster business insights. However, taking the steps necessary to maintain proper data quality assurance and governance often lead to complexities the business is unequipped to handle. Podium’s data conductor feature provides the flexibility to avoid these complications. Make the quick determination of where data should reside without sacrificing visibility across the enterprise.
- Eliminate unnecessary data duplication – While storage is inexpensive, it is not free and best practices emphasize judicial investment in hardware. In addition, buying more storage initiates a purchasing process, which within larger organizations can take weeks or months to approve. The flexibility of the data conductor capability ensures organizations can do more with their data and storage already in place.
- Diagnose the data anytime, anywhere – Want to understand details of the data without actually copying it into the lake? With data conductor, organizations can bring in the metadata and perform full validation, profiling and analysis while ultimately keeping the data at the original source. This means users can understand what is available quickly and easily, eliminating guess work or long processes to find out.
- Effectively manage archived data – Organizations with high volumes of data also have large information archives. Sometimes insights from archived data are necessary, for example, once a year for auditing purposes. Creating and retaining another copy of the data for a once a year purpose adds complexity which is not necessary when leveraging data conductor functionality.
- Identify opportunities through the metadata – Data conductor allows users to understand their source data before copying it into a fully-managed state. Users are able to scrutinize the source data, including Mainframe or RDBMS data, without moving it. Learn from the metadata first and promote it when the value-add facilitates faster, more effective business decisions.
Stay tuned for the final article of this three-part series where we explore combining flexibility with more insights and agility to facilitate the evolution of the self-service, data marketplace like never before.