Back to top

How Metadata Delivers Value to the Data Lake

Recently a Gartner analyst who covers data lakes mentioned to Podium that metadata remains a confusing topic for some IT teams considering or working on data lake projects. Here’s how that question generally sounds in his conversations with Gartner’ corporate subscribers:

“We know having good metadata in the lake is important, but we are less clear on what that really means in a practical sense. How specifically do certain types of metadata add value or deliver insight in the data lake? And who are the individuals that stand to benefit the most from a robust metadata repository?”

So in this blog we are going to bullet our 10 specific real-world examples of where having metadata in a data lake delivers real insight and utility value for both the people building the data lake and those using it (aka the data analysts).

  1. Metadata allows self-service on-demand access to data in the lake by non-technical users. By describing the key characteristics surrounding data sets, metadata can give business users the ability to understand what they’re dealing with.
  2. Metadata helps you understand and track the quality of data in the lake. In the same vein that metadata outlines pertinent characteristics like structure and content, it also enables users to identify and understand data quality.
  3. Metadata can provide a complete profile of each data set. By generating a statistical profile of a data set, metadata enables people accessing data to understand the data distribution of what they’re working with.
  4. Metadata allows for easy and consistent protection of sensitive data. It’s not just important to keep PII and other types of sensitive data secure – it’s vital in order to remain compliant. Metadata enables tagging of these types of data.
  5. Metadata helps companies stay in compliance with regulatory bodies. Metadata creates an “audit path,” which is another important factor in industries like financial services and healthcare.
  6. Metadata reduces duplicative ETL efforts. Instead of having to build the same data set twice, metadata enables users to access data generated at particular points, thereby eliminating unnecessary work.
  7. Metadata allows people to collaborate more effectively. By utilizing a common language that doesn’t require technical acumen, metadata enables different teams to understand and utilize data in a uniform, efficient way.
  8. Metadata makes it easier to control and document access. When high volumes of data are being discussed, it can be difficult to keep track of who is using what and when. Metadata enables administrators and overseers of data to set, maintain, and document access.
  9. Metadata makes data transparent. From the second it was on-boarded to the moment it’s being viewed, metadata tracks the journey that the data has gone through – making it easier to understand and trust.
  10. Metadata makes data that was previously “dark” to be accessible to users. When an organization has an extensive amount of “dirty” data in a mainframe system, it doesn’t provide much value. But with metadata that shows its structure and contents, it has the potential to become useful again.

Obviously, metadata isn’t the first thing that data analysts and business users think of when they set out to find pertinent information. But by making the entire process easier and improving the quality and security of the data, it plays an integral role in the success of any organization’s data efforts.

Want to learn more about how Podium handles metadata in the data lake? Check out our Technical Overview.