Created 08-09-2018 08:19 AM
Hi all,
I've been reading the Atlas code and understanding the Atlas architecture recently. Among them, the choice of Kafka as a message mechanism got my interest.
We know that Kafka is design for order message processing, but is order really important for Hooks to report change?
Thanks,
Eva
Created 08-09-2018 11:29 AM
Yes , it is important.
Consider there are 2 events from Hive :
1. Rename an Hive table ( example : employee to employee_personal)
2. Add a column to the renamed Hive table. ( add address field to employee_personal)
When Atlas Hive hook is configured , messages are sent for the above 2 events.
Say , If message #2 is received first by Atlas first , employee_personal is not yet known to Atlas. Hence Atlas creates employee_personal hive_table entity with address field column + other columns.
then , when message #1 is received , Atlas renames existing employee hive_table entity to employee_personal .
Now , there are 2 employee_personal entities in Atlas, whereas in Hive , there is only 1 employee_personal table
Hence , order is *very* important for Atlas being a Governance and Metadata management framework!
Created 08-09-2018 11:29 AM
Yes , it is important.
Consider there are 2 events from Hive :
1. Rename an Hive table ( example : employee to employee_personal)
2. Add a column to the renamed Hive table. ( add address field to employee_personal)
When Atlas Hive hook is configured , messages are sent for the above 2 events.
Say , If message #2 is received first by Atlas first , employee_personal is not yet known to Atlas. Hence Atlas creates employee_personal hive_table entity with address field column + other columns.
then , when message #1 is received , Atlas renames existing employee hive_table entity to employee_personal .
Now , there are 2 employee_personal entities in Atlas, whereas in Hive , there is only 1 employee_personal table
Hence , order is *very* important for Atlas being a Governance and Metadata management framework!
Created 08-10-2018 02:52 AM
Thank you very much Sharmadha!
But I still have one puzzle regarding to the order, take the example you presented:
1. say message#1 is sent but not processed yet.
2. when action#2 is taken, hook will consider employee_personal is not yet known to Atlas
3. in this case the entities contained in message#2 will have a new entity about this table, will that also result in 2 employee_personal entities in Atlas?
Thanks,
Eva
Created 08-10-2018 05:53 AM
Eva Xiao , Messages are processed in the order they are received by Atlas.
If message#1 has any error , only then message#2 is processed and action#3 specified by you can happen.