Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Atlas 0.5: Current Functionalities

avatar
Explorer

Hello,

I am wondering about the current Atlas functionality and the plans for Atlas moving forward. I have a project for which I would like to use Atlas but I wonder if it is the best tool for the job and if the current functionalities support the required tasks. I'm wondering about if/how Atlas can do the following tasks:

- Manage unstructured data metadata that does not pass through HIVE

- Import data profiling results from third party software

- Track element level lineage or the lineage of particular columns of tables

- Implement global rule definitions into Atlas

Could Falcon be used for some of these?

I'm currently using Atlas 0.5 for testing and imagine the same version will be used throughout the project.

Thanks,

John

1 ACCEPTED SOLUTION

avatar

Hi @John Yawney.

So answering your questions one at a time:

1) Currently Atlas 0.6 (that or later is expected in the next HDP release) supports the following hooks (therefore tracks the govornanace information for data that is touched by these systems) Hive, Sqoop, Falcon, Storm (with Spark, NiFi and HBase expected around end of year). Currently anything that doesn't have a hook won't be tracked.

For time frame information on the next release, there's no public information but if you look historically we have regularly announced new releases around the US Hadoop Summit timeframe.

This gives a good idea of what is coming down the line for Atlas:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62695330

2) Atlas is designed to be open and extensible so you could absolutely add 3rd party liniage information into the metastore. Any work you do in that area would also be greatly appreciated if contributed upstream also, with the added benifit that when accepted you won't need to suppor that code on your own.

3) Assuming you're talking Hive here? Column level liniage is expected around the end of the year.

4) You can, usually in combination with Ranger, but you'll need to be more specific about exactly what you mean for these rules.

Additional notes....

For more detailed information, as I know the documentation around Atlas is pretty poor... I'd strongly advise watching three sessions that occurred during the recent European Hadoop Summit, search for sessions by Andrew Ahn (there are three!)

http://www.hadoopsummit.org/dublin/agenda/

View solution in original post

2 REPLIES 2

avatar

Hi @John Yawney.

So answering your questions one at a time:

1) Currently Atlas 0.6 (that or later is expected in the next HDP release) supports the following hooks (therefore tracks the govornanace information for data that is touched by these systems) Hive, Sqoop, Falcon, Storm (with Spark, NiFi and HBase expected around end of year). Currently anything that doesn't have a hook won't be tracked.

For time frame information on the next release, there's no public information but if you look historically we have regularly announced new releases around the US Hadoop Summit timeframe.

This gives a good idea of what is coming down the line for Atlas:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62695330

2) Atlas is designed to be open and extensible so you could absolutely add 3rd party liniage information into the metastore. Any work you do in that area would also be greatly appreciated if contributed upstream also, with the added benifit that when accepted you won't need to suppor that code on your own.

3) Assuming you're talking Hive here? Column level liniage is expected around the end of the year.

4) You can, usually in combination with Ranger, but you'll need to be more specific about exactly what you mean for these rules.

Additional notes....

For more detailed information, as I know the documentation around Atlas is pretty poor... I'd strongly advise watching three sessions that occurred during the recent European Hadoop Summit, search for sessions by Andrew Ahn (there are three!)

http://www.hadoopsummit.org/dublin/agenda/

avatar
Explorer

Hi @drussell,

Thank you very much for the help and for the informative response. It certainly sounds like there is a lot in store for Atlas. I'm not too sure of the exact extent of my project yet but I believe the data rules would be more for data validation, perhaps setting up an automatic tag or flag for invalid entries and/or categorization depending upon certain statistics (maybe low, medium, high spending for example). I've been exploring the Atlas-Ranger tech preview and have been attempting to install 0.6 on our cluster. It seems like this would be possible in 0.6 but I'm not completely sure.

Thanks again,

John