Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is order really important for hook notifiication?

Solved Go to solution

Is order really important for hook notifiication?

New Contributor

Hi all,

I've been reading the Atlas code and understanding the Atlas architecture recently. Among them, the choice of Kafka as a message mechanism got my interest.

We know that Kafka is design for order message processing, but is order really important for Hooks to report change?

Thanks,

Eva

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Is order really important for hook notifiication?

Expert Contributor

Yes , it is important.

Consider there are 2 events from Hive :

1. Rename an Hive table ( example : employee to employee_personal)

2. Add a column to the renamed Hive table. ( add address field to employee_personal)

When Atlas Hive hook is configured , messages are sent for the above 2 events.

Say , If message #2 is received first by Atlas first , employee_personal is not yet known to Atlas. Hence Atlas creates employee_personal hive_table entity with address field column + other columns.

then , when message #1 is received , Atlas renames existing employee hive_table entity to employee_personal .

Now , there are 2 employee_personal entities in Atlas, whereas in Hive , there is only 1 employee_personal table

Hence , order is *very* important for Atlas being a Governance and Metadata management framework!

3 REPLIES 3

Re: Is order really important for hook notifiication?

Expert Contributor

Yes , it is important.

Consider there are 2 events from Hive :

1. Rename an Hive table ( example : employee to employee_personal)

2. Add a column to the renamed Hive table. ( add address field to employee_personal)

When Atlas Hive hook is configured , messages are sent for the above 2 events.

Say , If message #2 is received first by Atlas first , employee_personal is not yet known to Atlas. Hence Atlas creates employee_personal hive_table entity with address field column + other columns.

then , when message #1 is received , Atlas renames existing employee hive_table entity to employee_personal .

Now , there are 2 employee_personal entities in Atlas, whereas in Hive , there is only 1 employee_personal table

Hence , order is *very* important for Atlas being a Governance and Metadata management framework!

Re: Is order really important for hook notifiication?

New Contributor

Thank you very much Sharmadha!

But I still have one puzzle regarding to the order, take the example you presented:

1. say message#1 is sent but not processed yet.

2. when action#2 is taken, hook will consider employee_personal is not yet known to Atlas

3. in this case the entities contained in message#2 will have a new entity about this table, will that also result in 2 employee_personal entities in Atlas?

Thanks,

Eva

Re: Is order really important for hook notifiication?

Expert Contributor

Eva Xiao , Messages are processed in the order they are received by Atlas.

If message#1 has any error , only then message#2 is processed and action#3 specified by you can happen.