- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Atlas tag of files/folders in HDFS
- Labels:
-
Apache Atlas
-
Apache Hadoop
Created 10-20-2016 01:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is it possible to tag and/or search a folder or file with Atlas? I can't find any clear answers on what services/component which is possible to tag in Atlas.
Br Anders
Created 10-20-2016 01:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1.Look at Bridges section in http://atlas.incubator.apache.org/ to know about what all components Atlas provides Hook support. Currently , When you create hive/sqoop/falcon/storm entity which has an association to an HDFS path, it shows up in Atlas. Otherwise , any file/folder created in HDFS doesn't show up in Atlas.
For example, when you create a directory in HDFS , Atlas doesn't ingest it .
But when you create a hive table like :
"CREATE EXTERNAL TABLE test_table ( id int,value string) LOCATION '/user/hwx/output' "
Atlas creates a lineage graph which shows relationship between the hive table and the HDFS path.
You can see the HDFS directories by searching "hdfs_path" and the hive tables by searching "hive_table" etc.,
2.For the second question, any entity created in Atlas can be tagged.
In the above example , /user/hwx/output can be tagged , test_table can be tagged.
Created 10-20-2016 01:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1.Look at Bridges section in http://atlas.incubator.apache.org/ to know about what all components Atlas provides Hook support. Currently , When you create hive/sqoop/falcon/storm entity which has an association to an HDFS path, it shows up in Atlas. Otherwise , any file/folder created in HDFS doesn't show up in Atlas.
For example, when you create a directory in HDFS , Atlas doesn't ingest it .
But when you create a hive table like :
"CREATE EXTERNAL TABLE test_table ( id int,value string) LOCATION '/user/hwx/output' "
Atlas creates a lineage graph which shows relationship between the hive table and the HDFS path.
You can see the HDFS directories by searching "hdfs_path" and the hive tables by searching "hive_table" etc.,
2.For the second question, any entity created in Atlas can be tagged.
In the above example , /user/hwx/output can be tagged , test_table can be tagged.
Created 10-20-2016 01:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thx...
This means that security based on tags of individual files or folders in HDFS can't be solve at the moment? Correct?
Created 10-20-2016 01:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"individual files or folders in HDFS" - if this means it's not associated to any hive/sqoop/storm/falcon entity - it is not ingested by Atlas and yes, you cannot tag it at the moment.
Created 07-11-2019 09:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Some questions:
1. Are there any possible workarounds for this limitation that can be done to apply tags to arbitrary HDFS files and folders that other orgs use (or is this something people just don't do, if so why)?
2. By "...associated to any hive/sqoop/storm/falcon entity...", do you mean that a file that is imported via sqoop will show up in Atlas, but if I move that file, the lineage event of that file being moved will not show up?
Created 10-20-2016 01:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thx @ssainath