Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Atlas tag of files/folders in HDFS

Solved Go to solution

Atlas tag of files/folders in HDFS

Contributor

Hi,

Is it possible to tag and/or search a folder or file with Atlas? I can't find any clear answers on what services/component which is possible to tag in Atlas.

Br Anders

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Atlas tag of files/folders in HDFS

Expert Contributor

1.Look at Bridges section in http://atlas.incubator.apache.org/ to know about what all components Atlas provides Hook support. Currently , When you create hive/sqoop/falcon/storm entity which has an association to an HDFS path, it shows up in Atlas. Otherwise , any file/folder created in HDFS doesn't show up in Atlas.

For example, when you create a directory in HDFS , Atlas doesn't ingest it .

But when you create a hive table like :

"CREATE EXTERNAL TABLE test_table ( id int,value string) LOCATION '/user/hwx/output' "

Atlas creates a lineage graph which shows relationship between the hive table and the HDFS path.

You can see the HDFS directories by searching "hdfs_path" and the hive tables by searching "hive_table" etc.,

2.For the second question, any entity created in Atlas can be tagged.

In the above example , /user/hwx/output can be tagged , test_table can be tagged.

5 REPLIES 5
Highlighted

Re: Atlas tag of files/folders in HDFS

Expert Contributor

1.Look at Bridges section in http://atlas.incubator.apache.org/ to know about what all components Atlas provides Hook support. Currently , When you create hive/sqoop/falcon/storm entity which has an association to an HDFS path, it shows up in Atlas. Otherwise , any file/folder created in HDFS doesn't show up in Atlas.

For example, when you create a directory in HDFS , Atlas doesn't ingest it .

But when you create a hive table like :

"CREATE EXTERNAL TABLE test_table ( id int,value string) LOCATION '/user/hwx/output' "

Atlas creates a lineage graph which shows relationship between the hive table and the HDFS path.

You can see the HDFS directories by searching "hdfs_path" and the hive tables by searching "hive_table" etc.,

2.For the second question, any entity created in Atlas can be tagged.

In the above example , /user/hwx/output can be tagged , test_table can be tagged.

Re: Atlas tag of files/folders in HDFS

Contributor

Thx...

This means that security based on tags of individual files or folders in HDFS can't be solve at the moment? Correct?

Re: Atlas tag of files/folders in HDFS

Expert Contributor

@Anders Boje Larsen

"individual files or folders in HDFS" - if this means it's not associated to any hive/sqoop/storm/falcon entity - it is not ingested by Atlas and yes, you cannot tag it at the moment.

Re: Atlas tag of files/folders in HDFS

Contributor

@Sharmadha Sainath

Some questions:

1. Are there any possible workarounds for this limitation that can be done to apply tags to arbitrary HDFS files and folders that other orgs use (or is this something people just don't do, if so why)?

2. By "...associated to any hive/sqoop/storm/falcon entity...", do you mean that a file that is imported via sqoop will show up in Atlas, but if I move that file, the lineage event of that file being moved will not show up?

Re: Atlas tag of files/folders in HDFS

Contributor

Thx @ssainath

Don't have an account?
Coming from Hortonworks? Activate your account here