To capture real-time events occurring in Hive, I am thinking of writing a Hive Hook. As I am not well versed in Java, Can I use python to build Hive Hook?
If I can't. How to implement or customize Atlas Hive Hook written in Java.
Do Apache Airflow Hive Hook works in a similar fashion:
Thank you in Advance,
Hey @Nixon Rodrigues, Hive hook generally captures change occurring in Hive but it doesn't capture changes when we run "select". Let's say I have a BI tools and one of the report run "select * " on some hive table every time every time we refresh or consider this example, I am fetching data in Spark from Hive using JDBC driver and again after doing some transformation in Spark on the RDD, we are writing transformed data back into Hive. Atlas Hive Hook doesn't have the capability to capture Spark changes. Let say if I have to build Spark Hook for Atlas, How I can write. I was giving an example of Apache Airflow because they have designed Hive Hooks in Python. And my whole Atlas REST API calls have been designed in python and that is why I want to build Hook in python. I am trying to build end to end lineage in Atlas and I want to capture all changes occurring in Hive. Changes can be :
1. Some BI tools fetching Data out of Hive
2. Apache Spark integration with Hive
Let me know if we can use the existing Hive Hook and if we can how we can use with other services such as Spark, DbViz etc..
The only way to integrate Spark with Atlas now is to call Atlas API from your Spark application, either using REST API, or Java API. Atlas has extensible typesystem and you can create your own custom datatype for creating entities in Atlas.
Atlas API documentation :- http://atlas.apache.org/api/v2