Support Questions
Find answers, ask questions, and share your expertise

Export Metadata & Lineage from Atlas

New Contributor

Hi,

for a client we have to integrate Atlas (HDP 2.5) into their Enterprise Metadata Management tool which is Informatica Metadata Manager (IMM) currently in version 9.6.1.

Given there is currently no direct integration available from Informatica, we are trying to implement a tactical solution by exporting the metadata from Atlas into a spreadsheet (provided by Informatica) that can be imported into IMM 9.6.1.

What is the best way of doing that? We thought about exporting from HBase or using the Atlas APIs (as per the Atlas Technical User Guilde).

Has anyone tried that already? Anything that would be available?

Kind Regards, Markus Maus

1 ACCEPTED SOLUTION

Super Guru

@Markus Maus Atlas integration is highly flexible and you can use rest or kafka. For HDP partners, those tools integrate with Atlas using hooks. For others simply consume and publish all metadata using kafka. I would go directly against hbase. Use kafka topic to publish and consume metadata. Or you can use RestApi. Both options are documented on apache atlas http://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf

There are two types of messages for which Kafka is used. Each type is written to a specific topic in Kafka.

● Publishing entity changes to Atlas: These messages are passed from the metadata sources where the metadata is originally created / updated or deleted to Atlas. These messages are written to a topic called ATLAS_HOOK. Typically, these metadata sources are other components in the Hadoop ecosystem. As of Atlas 0.7-incubating, there are integrations with Hive, Sqoop, Falcon and Storm with Atlas.

● Consuming entity changes from Atlas: These messages are passed from Atlas to external consumers who might be interested in changes to metadata. An example of such a source in the current Hadoop ecosystem is Apache Ranger. By capturing metadata change events in real time, Ranger provides policy driven security management of Hadoop data assets. These messages are written to a topic called ATLAS_ENTITIES.

View solution in original post

5 REPLIES 5

Super Guru

@Markus Maus Atlas integration is highly flexible and you can use rest or kafka. For HDP partners, those tools integrate with Atlas using hooks. For others simply consume and publish all metadata using kafka. I would go directly against hbase. Use kafka topic to publish and consume metadata. Or you can use RestApi. Both options are documented on apache atlas http://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf

There are two types of messages for which Kafka is used. Each type is written to a specific topic in Kafka.

● Publishing entity changes to Atlas: These messages are passed from the metadata sources where the metadata is originally created / updated or deleted to Atlas. These messages are written to a topic called ATLAS_HOOK. Typically, these metadata sources are other components in the Hadoop ecosystem. As of Atlas 0.7-incubating, there are integrations with Hive, Sqoop, Falcon and Storm with Atlas.

● Consuming entity changes from Atlas: These messages are passed from Atlas to external consumers who might be interested in changes to metadata. An example of such a source in the current Hadoop ecosystem is Apache Ranger. By capturing metadata change events in real time, Ranger provides policy driven security management of Hadoop data assets. These messages are written to a topic called ATLAS_ENTITIES.

New Contributor

Thanks @Sunile Manjee. Is there any sample available on how HBase could be queried to extract metadata directly from Hbase? Couldn't find it in the technical user guide.

Super Guru

@Markus Maus I do not recommend going directly against hbase. Use Kafka or the api. can you describe why you are not using kafka or the api?

New Contributor

@Sunile Manjee - We haven't assessed the export via APIs yet. We are just trying to find the easiest way to export the Information from Atlas. Is there any sample code available for the export via API that we could use?

Super Guru

Yes on my github repo

https://github.com/sunileman/Atlas-API-Examples

And the Atlas technical guide has examples of api and kakfa. I recommend using use these integration patterns instead of going against hbase directly.

Usingredients kafka u can consume any chance Metadata changes at they occur. rest api is good for consuming Metadata as well, but obviously does not act as a messagingroup service.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.