Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Apache Atlas - Unable to collect HDFS metadata

New Contributor

I want to collect the metadata from HDFS. But when i searched , it looks like there is no atlas hook for HDFS available like it is there for kafka , hive and other data sources. Can I get the HDFS hook  install steps or the steps to create a custom hook. 

1 ACCEPTED SOLUTION

Cloudera Employee

@Nigal ,

 

Currently , When you create hive/sqoop/falcon/storm entity which has an association to HDFS path, it shows up in Atlas.
Otherwise , any file/folder created in HDFS doesn't show up in Atlas.
 
For example, when you create a directory in HDFS , Atlas doesn't ingest it .
But when you create a hive table like :
"CREATE EXTERNAL TABLE test_table ( id int,value string) LOCATION '/user/cloudera/text' "
Atlas creates a lineage graph which shows relationship between the hive table and the HDFS path.
 
You can see the HDFS directories by searching "hdfs_path" and the hive tables by searching "hive_table".

View solution in original post

4 REPLIES 4

Cloudera Employee

Hello @Nigal ,

 

Yes right. There is no 'HDFS hook' pre-defined in Atlas.
 
Atlas mainly collects information from Hive - Spark - Hbase - Impala
 
hdfs_path is synced only if this belongs to a Hive table's lineage (as is explained in https://issues.apache.org/jira/browse/ATLAS-599). By default, Atlas won't fetch HDFS paths.
 
Unlike HIVE entities, HDFS entities within Atlas are created manually using the Create Entity link within the Atlas Web UI.
 
Please check out the list of available 'hooks' in Atlas:
 
Here's a document on creating hdfs_path manually in Atlas:

New Contributor

Thanks for the solution . But this point i did'nt get clearly "hdfs_path is synced only if this belongs to a Hive table's lineage" . What i understood from this is that since hive runs on top of HDFS and on creating hive lineage, the lineage will show the HDFS path of hive warehouse directory. Is that correct?

Cloudera Employee

@Nigal ,

 

Currently , When you create hive/sqoop/falcon/storm entity which has an association to HDFS path, it shows up in Atlas.
Otherwise , any file/folder created in HDFS doesn't show up in Atlas.
 
For example, when you create a directory in HDFS , Atlas doesn't ingest it .
But when you create a hive table like :
"CREATE EXTERNAL TABLE test_table ( id int,value string) LOCATION '/user/cloudera/text' "
Atlas creates a lineage graph which shows relationship between the hive table and the HDFS path.
 
You can see the HDFS directories by searching "hdfs_path" and the hive tables by searching "hive_table".

New Contributor

Hi @pkr , Thanks for the solution. Much Appreciated

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.