Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Tracking of Hive tables metadata changes in real time using Atlas

Solved Go to solution
Highlighted

Tracking of Hive tables metadata changes in real time using Atlas

Super Collaborator

Hi everyone,

I am using HDP 2.6 and I want to track Hive tables metadata changes in real time. I have HiveHook enabled and I can see Kafka JSON messages in ATLAS_HOOK and ATLAS_ENTITIES topics. Also, Atlas is able to consume these entity updates. I am looking for the most optimal way to get entity updates info real time.

1) Is there a way to create a NotificationServer (like SMTP) to which Atlas will send these updates?

2) Or do I have to create a custom Kafka consumer that reads data directly from ATLAS_HOOK or ATLAS_ENTITIES topics in JSON?

P.S - I do not want to read everything from Kafka topic. There are thousands of tables but I want metadata changes for specific tables only. Please let me know how to setup the offsets related to particular databases/tables only. Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Tracking of Hive tables metadata changes in real time using Atlas

@Mushtaq Rizvi

As you already know, in addition to the API, Atlas uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata notification events. There is no other Notification Server capability like SMTP. You would have to write your own filtering through events for those tables that you are interested. That is your presented option 2.

You may not like it, but this is the best answer as of now. If you had NiFi you could easily write that Notification Server by filtering the events based on a lookup list of tables. With latest versions of NiFi you can take advantage of powerful processors like LookupRecord, QueryRecord, also processors around SMTP, email etc.

View solution in original post

3 REPLIES 3

Re: Tracking of Hive tables metadata changes in real time using Atlas

@Mushtaq Rizvi

As you already know, in addition to the API, Atlas uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata notification events. There is no other Notification Server capability like SMTP. You would have to write your own filtering through events for those tables that you are interested. That is your presented option 2.

You may not like it, but this is the best answer as of now. If you had NiFi you could easily write that Notification Server by filtering the events based on a lookup list of tables. With latest versions of NiFi you can take advantage of powerful processors like LookupRecord, QueryRecord, also processors around SMTP, email etc.

View solution in original post

Highlighted

Re: Tracking of Hive tables metadata changes in real time using Atlas

Super Collaborator

Thank you @Constantin Stanca so much. This was very much needed. Can you please let me know the Nifi version compatible with HDP 2.6. I have HDP cluster already installed on google cloud, cannot install a separate HDF cluster. Is there a standalone jar of Nifi that can work in HDP cluster?

Highlighted

Re: Tracking of Hive tables metadata changes in real time using Atlas

@Mushtaq Rizvi

Yes. Please follow the instructions on how to add HDF components to an existent HDP 2.6.1 cluster:

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1/bk_installing-hdf-on-hdp/content/upgrading_...

This is not the latest HDF, but it is compatible with HDP 2.6.1 and I was pretty happy with its stability and recommend it.

You would be able to add Apache NiFi 1.5, but also Schema Registry. NiFi Registry is part of the latest HDF 3.1.x, however, you would have to install it in a separate cluster and it is not worth it the effort for what you are trying to achieve right now. I would proceed with HDP upgrade when you are ready for HDF 3.2 which will be probably launched in the next couple months.

In case that you can't add another node to your cluster for NiFi, try to use one of the nodes that has low CPU utilization and some disk available for NiFi lineage data storage. It depends on how much lineage you want to preserve, but you should be probably fine with several tens of GB for starters.

If this response helped, please vote and accept answer.

Don't have an account?
Coming from Hortonworks? Activate your account here