Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can we build Hive Hook in Python ?

Highlighted

Can we build Hive Hook in Python ?

New Contributor

Hi,

To capture real-time events occurring in Hive, I am thinking of writing a Hive Hook. As I am not well versed in Java, Can I use python to build Hive Hook?

If I can't. How to implement or customize Atlas Hive Hook written in Java.

https://github.com/apache/atlas/blob/master/addons/hive-bridge/src/main/java/org/apache/atlas/hive/h...

Do Apache Airflow Hive Hook works in a similar fashion:

https://pythonhosted.org/airflow/_modules/hive_hooks.html

Thank you in Advance,

Subash

3 REPLIES 3

Re: Can we build Hive Hook in Python ?

Expert Contributor

@subash sharma,

There is already is Hive Hook written and integrated well in Hive from Ambari.

What is the purpose behind rewritting in Python ?

Re: Can we build Hive Hook in Python ?

New Contributor

Hey @Nixon Rodrigues, Hive hook generally captures change occurring in Hive but it doesn't capture changes when we run "select". Let's say I have a BI tools and one of the report run "select * " on some hive table every time every time we refresh or consider this example, I am fetching data in Spark from Hive using JDBC driver and again after doing some transformation in Spark on the RDD, we are writing transformed data back into Hive. Atlas Hive Hook doesn't have the capability to capture Spark changes. Let say if I have to build Spark Hook for Atlas, How I can write. I was giving an example of Apache Airflow because they have designed Hive Hooks in Python. And my whole Atlas REST API calls have been designed in python and that is why I want to build Hook in python. I am trying to build end to end lineage in Atlas and I want to capture all changes occurring in Hive. Changes can be :

1. Some BI tools fetching Data out of Hive

2. Apache Spark integration with Hive

Let me know if we can use the existing Hive Hook and if we can how we can use with other services such as Spark, DbViz etc..

Thank you,

Subash

Re: Can we build Hive Hook in Python ?

Expert Contributor

@subash sharma,

The only way to integrate Spark with Atlas now is to call Atlas API from your Spark application, either using REST API, or Java API. Atlas has extensible typesystem and you can create your own custom datatype for creating entities in Atlas.

Atlas API documentation :- http://atlas.apache.org/api/v2

Don't have an account?
Coming from Hortonworks? Activate your account here