- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
metadata enrichment, data dictionary and data lineage in Hive
- Labels:
-
Apache Hive
-
Apache Oozie
Created on ‎10-14-2015 09:16 AM - edited ‎09-16-2022 02:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hive tables can store comments on columns when the table is created.
Oozie is used to load tables by copying data from remote nodes to edge node first and then loading to hive tables.
What are the recommended practices for metadata enrichment, building a data dictionary and maintaining data lineage ?
Created ‎11-02-2015 03:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Starting in HDP 2.3, Atlas provides mechanisms that allow for automatic creation of hive entities (DBs, Tables, Views, Columns) through a hive post execution hook. These entities can then be decorated with additional metadata through the association of Traits (think complex attribute bearing Tags).
Atlas also provides an extensible mechanism which allows for the creation of a business term related taxonomy. This separate taxonomy can then be associated with specific Hive assets (entities).
For more information, refer to http://incubator.apache.org/projects/atlas.html.
Created ‎11-02-2015 03:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Starting in HDP 2.3, Atlas provides mechanisms that allow for automatic creation of hive entities (DBs, Tables, Views, Columns) through a hive post execution hook. These entities can then be decorated with additional metadata through the association of Traits (think complex attribute bearing Tags).
Atlas also provides an extensible mechanism which allows for the creation of a business term related taxonomy. This separate taxonomy can then be associated with specific Hive assets (entities).
For more information, refer to http://incubator.apache.org/projects/atlas.html.
Created ‎12-10-2015 09:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We just released on a production cluster (HDP 2.2.8), Waterlinedata (http://www.waterlinedata.com/😞 it's a great tool for metadata enrichment, data dictionary, data lineage and autodiscovery for HDFS and Hive data that run on top of Hadoop (YARN). It's ready to "speak" with Atlas thru API and you have a great Web UI.
One of the coolest feature is the possibility to create, thru Web UI, an external Hive table from an HDFS in 2 clicks.
Created ‎12-10-2015 09:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
