Support Questions
Find answers, ask questions, and share your expertise

Atlas based based lineage / Ranger based access & COTS ETL tools

Solved Go to solution

Atlas based based lineage / Ranger based access & COTS ETL tools

New Contributor

Do Atlas / Ranger based lineage / tag based access keep on working when a Hive table is processed with a 'external' ETL tool like SAS DI or SAP BODS and then written back to a (different) Hive table?

I would guess this is only possible when the metadata from these ETL tools can sync up with Atlas / Ranger

http://hortonworks.com/apache/atlas/#section_1 tells me that:

Atlas, at its core, is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements

...so if I am reading this correctly, the functionallity I seek is at least 'on the roadmap'...

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Atlas based based lineage / Ranger based access & COTS ETL tools

Hi @rogier werschkull.

You're correct, those steps currently wouldn't be tracked as a lineage, depending on exactly how the data is manipulated. Some of the Hive linage may be tracked depending on how those tools integrate with the data via the Hive service for example.

SAS and an increasing number of partners, customers and community members are part of the Data Governance Initiative (DGI). You can reasonably expect those members of the DGI to be first in the queue to have their solutions more integrated into Atlas for the shared metadata exchange.

Hope that helps.

View solution in original post

3 REPLIES 3

Re: Atlas based based lineage / Ranger based access & COTS ETL tools

Hi @rogier werschkull.

You're correct, those steps currently wouldn't be tracked as a lineage, depending on exactly how the data is manipulated. Some of the Hive linage may be tracked depending on how those tools integrate with the data via the Hive service for example.

SAS and an increasing number of partners, customers and community members are part of the Data Governance Initiative (DGI). You can reasonably expect those members of the DGI to be first in the queue to have their solutions more integrated into Atlas for the shared metadata exchange.

Hope that helps.

View solution in original post

Re: Atlas based based lineage / Ranger based access & COTS ETL tools

New Contributor

Hi @Dave Russell, thank you for your answer.

I have a followup question to your response: do you know whether there is more 'native hadoop-like' ETl tooling available at this moment that generates pure map-reduce / hive / pig scripts 'under the hood' so this kind of lineage & security inheritance is not lost?

I am thinking about tooling from companies like Talend or Syncsort DMX-h here...

Re: Atlas based based lineage / Ranger based access & COTS ETL tools

Explorer

Hello all,

I am in the same question as @Dave Russell expecting that companies like Talend make this available as soon as possible.

Thanks