Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

metadata enrichment, data dictionary and data lineage in Hive

avatar
Expert Contributor

Hive tables can store comments on columns when the table is created.

Oozie is used to load tables by copying data from remote nodes to edge node first and then loading to hive tables.

What are the recommended practices for metadata enrichment, building a data dictionary and maintaining data lineage ?

1 ACCEPTED SOLUTION

avatar

Starting in HDP 2.3, Atlas provides mechanisms that allow for automatic creation of hive entities (DBs, Tables, Views, Columns) through a hive post execution hook. These entities can then be decorated with additional metadata through the association of Traits (think complex attribute bearing Tags).

Atlas also provides an extensible mechanism which allows for the creation of a business term related taxonomy. This separate taxonomy can then be associated with specific Hive assets (entities).

For more information, refer to http://incubator.apache.org/projects/atlas.html.

View solution in original post

3 REPLIES 3

avatar

Starting in HDP 2.3, Atlas provides mechanisms that allow for automatic creation of hive entities (DBs, Tables, Views, Columns) through a hive post execution hook. These entities can then be decorated with additional metadata through the association of Traits (think complex attribute bearing Tags).

Atlas also provides an extensible mechanism which allows for the creation of a business term related taxonomy. This separate taxonomy can then be associated with specific Hive assets (entities).

For more information, refer to http://incubator.apache.org/projects/atlas.html.

avatar
Rising Star

We just released on a production cluster (HDP 2.2.8), Waterlinedata (http://www.waterlinedata.com/😞 it's a great tool for metadata enrichment, data dictionary, data lineage and autodiscovery for HDFS and Hive data that run on top of Hadoop (YARN). It's ready to "speak" with Atlas thru API and you have a great Web UI.

One of the coolest feature is the possibility to create, thru Web UI, an external Hive table from an HDFS in 2 clicks.

avatar
Master Mentor