Created 05-30-2016 01:11 PM
Do Atlas / Ranger based lineage / tag based access keep on working when a Hive table is processed with a 'external' ETL tool like SAS DI or SAP BODS and then written back to a (different) Hive table?
I would guess this is only possible when the metadata from these ETL tools can sync up with Atlas / Ranger
http://hortonworks.com/apache/atlas/#section_1 tells me that:
Atlas, at its core, is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements
...so if I am reading this correctly, the functionallity I seek is at least 'on the roadmap'...
Created 05-31-2016 08:39 AM
You're correct, those steps currently wouldn't be tracked as a lineage, depending on exactly how the data is manipulated. Some of the Hive linage may be tracked depending on how those tools integrate with the data via the Hive service for example.
SAS and an increasing number of partners, customers and community members are part of the Data Governance Initiative (DGI). You can reasonably expect those members of the DGI to be first in the queue to have their solutions more integrated into Atlas for the shared metadata exchange.
Hope that helps.
Created 05-31-2016 08:39 AM
You're correct, those steps currently wouldn't be tracked as a lineage, depending on exactly how the data is manipulated. Some of the Hive linage may be tracked depending on how those tools integrate with the data via the Hive service for example.
SAS and an increasing number of partners, customers and community members are part of the Data Governance Initiative (DGI). You can reasonably expect those members of the DGI to be first in the queue to have their solutions more integrated into Atlas for the shared metadata exchange.
Hope that helps.
Created 05-31-2016 02:17 PM
Hi @Dave Russell, thank you for your answer.
I have a followup question to your response: do you know whether there is more 'native hadoop-like' ETl tooling available at this moment that generates pure map-reduce / hive / pig scripts 'under the hood' so this kind of lineage & security inheritance is not lost?
I am thinking about tooling from companies like Talend or Syncsort DMX-h here...
Created 07-14-2016 05:56 PM
Hello all,
I am in the same question as @Dave Russell expecting that companies like Talend make this available as soon as possible.
Thanks