Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Does Apache Atlas displays lineage for HDFS loaction(Atlas 0.5)?

Does Apache Atlas displays lineage for HDFS loaction(Atlas 0.5)?

Expert Contributor

Hello,

Actually I have created managed hive table and loaded data into it by using "load data inpath" command but I haven't seen any lineage diagram/icon in Atlas UI.I meant,The Atlas UI should have shown lineage for HDFS location too which is nothing but source for hive table in this example.

My Data file is present on HDFS location.

3 REPLIES 3
Highlighted

Re: Does Apache Atlas displays lineage for HDFS loaction(Atlas 0.5)?

Contributor

Hi Manoj, I am having a similar problem and digging in a little now myself, I will report back. But in case you have not seen it there is an excellent presentation on atlas and the roadmap from the Hadoop Summit Dublin last month, that may help. they actually do mention a few minor Hive issues they are working on.

http://hadoopsummit.org/dublin/agenda/ Apache Atlas Tracking dataset lineage across hadoop components (slides and video)

direct link to video - apache atlas Tracking dataset lineage across hadoop comp

Highlighted

Re: Does Apache Atlas displays lineage for HDFS loaction(Atlas 0.5)?

Expert Contributor

Thanks for Replying Dennis.

Highlighted

Re: Does Apache Atlas displays lineage for HDFS loaction(Atlas 0.5)?

Explorer

Hi @Manoj Dhake,

I've also been exploring the Atlas 0.5 version and don't believe that his type of lineage is natively supported. In Atlas 0.6 there is cross-component lineage and that will allow for tracking of data ingested into hive through sqoop. The example of that demo can be found here: http://hortonworks.com/hadoop-tutorial/cross-component-lineage-apache-atlas/

With Atlas 0.5 you can create 'Process' types through the REST API or use the LoadProcess type that is defined with the QuickStart.py function in /usr/hdp/current/atlas-server/bin. Through REST API you can link two types with super type 'DataSet'. If you wanted to define a type to reference your HDFS file such as the type 'Table' with super type 'DataSet', you could then link an entity of type 'Table' to the HIVE table entity.

If you have some familiarity with the REST API then I had the most success learning about the API calls by looking at the REST API docs:

http://atlas.incubator.apache.org/api/rest.html

and the actual type and entity java source code:

https://github.com/apache/incubator-atlas/blob/branch-0.5-incubating/webapp/src/main/java/org/apache...

https://github.com/apache/incubator-atlas/blob/branch-0.5-incubating/webapp/src/main/java/org/apache...

I hope this helps.

Best regards,

John

Don't have an account?
Coming from Hortonworks? Activate your account here