Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

metadata enrichment, data dictionary and data lineage in Hive

Solved Go to solution

metadata enrichment, data dictionary and data lineage in Hive

Contributor

Hive tables can store comments on columns when the table is created.

Oozie is used to load tables by copying data from remote nodes to edge node first and then loading to hive tables.

What are the recommended practices for metadata enrichment, building a data dictionary and maintaining data lineage ?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: metadata enrichment, data dictionary and data lineage in Hive

New Contributor

Starting in HDP 2.3, Atlas provides mechanisms that allow for automatic creation of hive entities (DBs, Tables, Views, Columns) through a hive post execution hook. These entities can then be decorated with additional metadata through the association of Traits (think complex attribute bearing Tags).

Atlas also provides an extensible mechanism which allows for the creation of a business term related taxonomy. This separate taxonomy can then be associated with specific Hive assets (entities).

For more information, refer to http://incubator.apache.org/projects/atlas.html.

3 REPLIES 3
Highlighted

Re: metadata enrichment, data dictionary and data lineage in Hive

New Contributor

Starting in HDP 2.3, Atlas provides mechanisms that allow for automatic creation of hive entities (DBs, Tables, Views, Columns) through a hive post execution hook. These entities can then be decorated with additional metadata through the association of Traits (think complex attribute bearing Tags).

Atlas also provides an extensible mechanism which allows for the creation of a business term related taxonomy. This separate taxonomy can then be associated with specific Hive assets (entities).

For more information, refer to http://incubator.apache.org/projects/atlas.html.

Re: metadata enrichment, data dictionary and data lineage in Hive

Contributor

We just released on a production cluster (HDP 2.2.8), Waterlinedata (http://www.waterlinedata.com/): it's a great tool for metadata enrichment, data dictionary, data lineage and autodiscovery for HDFS and Hive data that run on top of Hadoop (YARN). It's ready to "speak" with Atlas thru API and you have a great Web UI.

One of the coolest feature is the possibility to create, thru Web UI, an external Hive table from an HDFS in 2 clicks.

Re: metadata enrichment, data dictionary and data lineage in Hive