Podium Pointer #11: Taking Data Lineage to the Next Level in the Data Marketplace
Welcome to another installment of “Podium Pointers” a series dedicated to tackling complex topics in as few words as possible, and leading readers to additional resources for information and assistance.
As CDOs shift their strategic mindset from defense to offense, so does the value of effective data lineage. The big question to answer is how to ensure the quality of end-results across the business gives you confidence in getting value from your data investment. The concept of garbage in / garbage out with data is a misnomer. Data moves and data changes - often undergoing a variety of transformations. Thus, data deemed to be from a good quality source can, in the end, still not provide the anticipated value-add to the business. This unknown keeps business leaders across all industries up at night.
In the past, data lineage to address the question of “where” was enough to meet business needs and there are many different views to choose from that map out where the data have come and gone. However, understanding the not just where the data has moved, but how it has changed has become just as critical. Within Podium, users take data lineage to the next level. Down to the individual field (column), the Podium Data Marketplace enables not just a view of all parent/child relationships, but the visibility to see how those fields as used throughout those identified preparation dataflows.
Let’s walkthrough an example. After building a dataflow in Podium’s Prepare Module, you know the newly created target datasets leverage the entity, “payment_fact” and the specific field “customer_id”. Before moving forward, you want to understand where else and how this field is being used throughout the marketplace. From the payment_fact entity, you see the data source named “POSTGRES_DIGITAL_DEMO”. A quick global search in the top right brings you directly to the source.
From the source, drilling down to the specific entity, then to the field of interest choosing “View Lineage” from the “More” dropdown list to access the details you need.
Since we’re looking at the field from the source perspective, there is no parent lineage. Within the child lineage, this “customer_id” field is shown to be used in 4 places with the details of all prepare operations provided.
- The data is loaded into the Podium Data Marketplace
- Used within ”DIGITAL_WORKFLOW_ DEMO” dataflow shown above in creation of new “CUST_AVG_SPEND” entity where the field was filtered, joined, routed and then aggregated.
- Used within ”DIGITAL_WORKFLOW_ DEMO” dataflow shown above in creation of new “FLAT_WIDE_B” entity where the field was filtered, joined and routed
- It is also discovered this “customer_id” field is used within another prepare dataflow “COX_WORKFLOW_DEMO” to create an “AGGREGATE_A” entity where the field was filtered in a different manner, joined, routed in a different manner and aggregated.
Clicking on the ”AGGREGATE_A” entity from the Child Lineage shown above and drilling into the fields below confirm the “customer_id” field referenced in the lineage used within. You now know from the Child Lineage “Rule” how the “customer_id” field is populated within the “AGGREGATE_A” entity via the prepare dataflow operations.
This detailed view is accessible no matter where you are along the lifecycle of the field. Viewing the lineage of a field from a newly created dataflow target reveals parent lineage describing the “where” upstream. In addition, you will understand the “where” and “how” of this particular field under any child entities if it’s being used in other dataflows. That’s next level data lineage within the marketplace.
In today’s world of big data, lineage takes on a new, more important role to increase trust and speed the process of generating data-driven insights. Contact us to learn more about how reach a level of data understanding like never before.