Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HiveMetaStoreBridge code - only creating lineage for TableType.EXTERNAL_TABLE and if process == null ? for .7rc2

HiveMetaStoreBridge code - only creating lineage for TableType.EXTERNAL_TABLE and if process == null ? for .7rc2

Explorer

Hi,

I have configured my Eclipse debugger to debug the import-hive.sh (HiveMetaStorgeBridge.java) because i was not able to figure out why no lineage was getting created - whereas - the metadata of the hive tables was indeed getting imported.

So after some time I found that the code on line 274 only allowed tables of type EXTERNAL_TABLE to be considered for lineage. Why? What is the rational behind this check ? Is there any technical reason why a managed table cannot be considered for metadata lineage ?

Second, it checks for getProcessReference(tableQualifiedName) - what is this actually checking ? only if this returns null is lineage created. Why?

So as an experiment - I modified the code in my environment - removed the check and WHOLA - i am now getting lineage for all my tables.

Regards,

Russ Anderson

IBM Consultant /Metadata center of excellence

2 REPLIES 2
Highlighted

Re: HiveMetaStoreBridge code - only creating lineage for TableType.EXTERNAL_TABLE and if process == null ? for .7rc2

Guru

That is very interesting. I would assume the check was for lineage of imported data but I see your point about it working for a managed table as well. What version of Atlas are you using?

Highlighted

Re: HiveMetaStoreBridge code - only creating lineage for TableType.EXTERNAL_TABLE and if process == null ? for .7rc2

New Contributor

The rationale behind creating lineage only for external tables is that the source HDFS path for an external table is not known inherently. For a managed table, the source HDFS path is guaranteed to be {HIVE_DATA_ROOT}/{TABLENAME}. Lineage information might not add much value for a managed table.

The check getProcessReference(tableQualifiedName) == null ensures that the lineage process is registered only if doesn't exist in Atlas database.

Don't have an account?
Coming from Hortonworks? Activate your account here