Back to top
Onboard Data

Load, validate, profile, and protect data with ease — from any source.

Podium’s onboarding process goes far beyond simple ingestion by automatically profiling, validating, and documenting the exact content, structure, and quality of a wide variety of enterprise data sources, including large or complex data sources. Podium generates rich metadata, which allows new data sources to be added into the Podium Metadata Repository, and lays the foundation for actual data sets to be loaded into the Podium data collection and underlying data storage layer.

Podium enables users to onboard, convert, load, validate, and profile data with ease and efficiency through Podium’s powerful ingestion framework. The process produces clean data sets, with managed history, registered in Podium’s data storage layer and ready to query.

Podium provides guided wizards to build metadata-driven processes for ingesting relational databases, mainframe data sources, JSON and XML files, and flat files. Metadata can be imported directly from the source when available, or easily created with automated assistance from Podium. This metadata is used to define the source format, reformat data into standard structures, and validate the data as it is loaded. Podium automatically converts nonstandard data formats (such as mainframe files and XML hierarchical records) to standardized character sets and formats. This includes normalizing hierarchical XML data structures to tables that can subsequently be queried as relational databases.

And Podium manages the history of ingested data, automatically updating partitions in the Podium Metadata Repository and correctly managing incremental snapshot updates. This history and update information can be synchronized with other metadata repositories, such as HCatalog, Atlas, or Navigator. Podium’s data sourcing capabilities can be scheduled for automatic execution in production environments. Podium’s onboarding process runs natively on high-performance, parallel execution, multiprocessor platforms, allowing for faster data onboarding and scalability, even as data volumes grow.


Key Capabilities:

  • Onboard data and metadata from all sources and formats
  • Standardize record formats, data types, and character sets
  • Convert mainframe, JSON, and XML data to queryable format
  • Validate records against expected formats/types/values
  • Profile new fields of data to generate a statistical profile
  • Apply post-processing rules, find/act on insights in data
  • Standardized Managed Filesystem Taxonomy
  • History management

Learn More: