Created 11-16-2016 04:44 AM
Hi,
for a client we have to integrate Atlas (HDP 2.5) into their Enterprise Metadata Management tool which is Informatica Metadata Manager (IMM) currently in version 9.6.1.
Given there is currently no direct integration available from Informatica, we are trying to implement a tactical solution by exporting the metadata from Atlas into a spreadsheet (provided by Informatica) that can be imported into IMM 9.6.1.
What is the best way of doing that? We thought about exporting from HBase or using the Atlas APIs (as per the Atlas Technical User Guilde).
Has anyone tried that already? Anything that would be available?
Kind Regards, Markus Maus
Created 11-16-2016 05:11 AM
@Markus Maus Atlas integration is highly flexible and you can use rest or kafka. For HDP partners, those tools integrate with Atlas using hooks. For others simply consume and publish all metadata using kafka. I would go directly against hbase. Use kafka topic to publish and consume metadata. Or you can use RestApi. Both options are documented on apache atlas http://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf
There are two types of messages for which Kafka is used. Each type is written to a specific topic in Kafka.
● Publishing entity changes to Atlas: These messages are passed from the metadata sources where the metadata is originally created / updated or deleted to Atlas. These messages are written to a topic called ATLAS_HOOK. Typically, these metadata sources are other components in the Hadoop ecosystem. As of Atlas 0.7-incubating, there are integrations with Hive, Sqoop, Falcon and Storm with Atlas.
● Consuming entity changes from Atlas: These messages are passed from Atlas to external consumers who might be interested in changes to metadata. An example of such a source in the current Hadoop ecosystem is Apache Ranger. By capturing metadata change events in real time, Ranger provides policy driven security management of Hadoop data assets. These messages are written to a topic called ATLAS_ENTITIES.
Created 11-16-2016 05:11 AM
@Markus Maus Atlas integration is highly flexible and you can use rest or kafka. For HDP partners, those tools integrate with Atlas using hooks. For others simply consume and publish all metadata using kafka. I would go directly against hbase. Use kafka topic to publish and consume metadata. Or you can use RestApi. Both options are documented on apache atlas http://atlas.incubator.apache.org/AtlasTechnicalUserGuide.pdf
There are two types of messages for which Kafka is used. Each type is written to a specific topic in Kafka.
● Publishing entity changes to Atlas: These messages are passed from the metadata sources where the metadata is originally created / updated or deleted to Atlas. These messages are written to a topic called ATLAS_HOOK. Typically, these metadata sources are other components in the Hadoop ecosystem. As of Atlas 0.7-incubating, there are integrations with Hive, Sqoop, Falcon and Storm with Atlas.
● Consuming entity changes from Atlas: These messages are passed from Atlas to external consumers who might be interested in changes to metadata. An example of such a source in the current Hadoop ecosystem is Apache Ranger. By capturing metadata change events in real time, Ranger provides policy driven security management of Hadoop data assets. These messages are written to a topic called ATLAS_ENTITIES.
Created 11-17-2016 12:03 AM
Thanks @Sunile Manjee. Is there any sample available on how HBase could be queried to extract metadata directly from Hbase? Couldn't find it in the technical user guide.
Created 11-17-2016 05:07 AM
@Markus Maus I do not recommend going directly against hbase. Use Kafka or the api. can you describe why you are not using kafka or the api?
Created 11-17-2016 05:50 AM
@Sunile Manjee - We haven't assessed the export via APIs yet. We are just trying to find the easiest way to export the Information from Atlas. Is there any sample code available for the export via API that we could use?
Created 11-17-2016 06:24 AM
Yes on my github repo
https://github.com/sunileman/Atlas-API-Examples
And the Atlas technical guide has examples of api and kakfa. I recommend using use these integration patterns instead of going against hbase directly.
Usingredients kafka u can consume any chance Metadata changes at they occur. rest api is good for consuming Metadata as well, but obviously does not act as a messagingroup service.