Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Atlas tag of files/folders in HDFS

avatar
Contributor

Hi,

Is it possible to tag and/or search a folder or file with Atlas? I can't find any clear answers on what services/component which is possible to tag in Atlas.

Br Anders

1 ACCEPTED SOLUTION

avatar
Super Collaborator

1.Look at Bridges section in http://atlas.incubator.apache.org/ to know about what all components Atlas provides Hook support. Currently , When you create hive/sqoop/falcon/storm entity which has an association to an HDFS path, it shows up in Atlas. Otherwise , any file/folder created in HDFS doesn't show up in Atlas.

For example, when you create a directory in HDFS , Atlas doesn't ingest it .

But when you create a hive table like :

"CREATE EXTERNAL TABLE test_table ( id int,value string) LOCATION '/user/hwx/output' "

Atlas creates a lineage graph which shows relationship between the hive table and the HDFS path.

You can see the HDFS directories by searching "hdfs_path" and the hive tables by searching "hive_table" etc.,

2.For the second question, any entity created in Atlas can be tagged.

In the above example , /user/hwx/output can be tagged , test_table can be tagged.

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

1.Look at Bridges section in http://atlas.incubator.apache.org/ to know about what all components Atlas provides Hook support. Currently , When you create hive/sqoop/falcon/storm entity which has an association to an HDFS path, it shows up in Atlas. Otherwise , any file/folder created in HDFS doesn't show up in Atlas.

For example, when you create a directory in HDFS , Atlas doesn't ingest it .

But when you create a hive table like :

"CREATE EXTERNAL TABLE test_table ( id int,value string) LOCATION '/user/hwx/output' "

Atlas creates a lineage graph which shows relationship between the hive table and the HDFS path.

You can see the HDFS directories by searching "hdfs_path" and the hive tables by searching "hive_table" etc.,

2.For the second question, any entity created in Atlas can be tagged.

In the above example , /user/hwx/output can be tagged , test_table can be tagged.

avatar
Contributor

Thx...

This means that security based on tags of individual files or folders in HDFS can't be solve at the moment? Correct?

avatar
Super Collaborator

@Anders Boje Larsen

"individual files or folders in HDFS" - if this means it's not associated to any hive/sqoop/storm/falcon entity - it is not ingested by Atlas and yes, you cannot tag it at the moment.

avatar
Expert Contributor

@Sharmadha Sainath

Some questions:

1. Are there any possible workarounds for this limitation that can be done to apply tags to arbitrary HDFS files and folders that other orgs use (or is this something people just don't do, if so why)?

2. By "...associated to any hive/sqoop/storm/falcon entity...", do you mean that a file that is imported via sqoop will show up in Atlas, but if I move that file, the lineage event of that file being moved will not show up?

avatar
Contributor

Thx @ssainath