Created 07-23-2017 02:51 PM
Hi
I want to learn Atlas and then finally have a system (Spark) export meta data information to Atlas (including lineage). I have downloaded HDP 2.6. I tried to follow the Cross Component Lineage tutorial.
Unfortunately the download link in 3.2 in the tutorial does not work anymore. Also the directory and scripts in section 4.1 do not exist anymore.
Can anyone help me on how to get started with Atlas and the things I should have a look into to be able to import metadata information from Spark into Atlas. Would I need to modify atlas in any way or it would be more of defining entities in Atlas.
Thankyou
Created 07-23-2017 03:37 PM
@Arsalan Siddiqi Good place to start would be to Quick Start Tutorial.
For your work, it will help to learn about type system. This will help you create types specific to what you are trying to export.
Detailed REST API will help.
If you are developing a hook for Spark, take a look at the code for existing Hive Hook.
Feel free to post on HCC, we are here to help!
Created 07-23-2017 03:37 PM
@Arsalan Siddiqi Good place to start would be to Quick Start Tutorial.
For your work, it will help to learn about type system. This will help you create types specific to what you are trying to export.
Detailed REST API will help.
If you are developing a hook for Spark, take a look at the code for existing Hive Hook.
Feel free to post on HCC, we are here to help!
Created 07-23-2017 04:13 PM
@Ashutosh Mestry thanks for the instant reply. I will look into the links you have shared. is there any guide on how to run this tutorial and some explanation ? All I get from the page is to run the script and it will set up the metadata information. I guess in HDP it is located under '/usr/hdp/current/atlas-server/bin/quick_start.py'. I did load the data and it is visible in Atlas. Is there any explaination for the example
Created 07-24-2017 12:00 PM
Created 07-25-2017 11:08 AM
@Arsalan Siddiqi The quick_start sample aims to demonstrate use of type system, entity creation, creation of lineage and then search.
Though Atlas provides out-of-box types for hive, falcon, etc. It also allows for you to create your own types.
Once types are created entities of those types can be created. Think of entities as instances of types.
Specifically to quick_start, the entities depend on each other, thereby showing how lineage can be used (see sales_fact).
One key highlight is the use of tags. This allows grouping entities that are semantically relevant. See how the PII tag is used. See how all the load operations show up once ETL tag is selected.
Hope this helps!