- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Discovering existing Hive tables in Atlas
- Labels:
-
Apache Atlas
-
Apache Hive
Created ‎01-24-2019 04:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way to let Atlas discover existing Hive tables (in HDP 2.6.5)? I've got a couple of Hive external tables that existed before Atlas was enabled. It didn't pick them up, but Atlas does find new tables.
I'm trying to get one Hive table in Atlas, but it's not exactly easy to do that for tables, columns, storagedesc and references. There are very few (simple) examples out there. I'm getting one ObjectId is not valid AtlasObjectId error after the other.
Created ‎01-28-2019 03:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Marcel-Jan Krijgsman Do run the /usr/hdp/current/atlas-server/hook-bin/import-hive.sh utility which imports the existing hive tables into atlas.
Created ‎01-28-2019 03:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been trying this now for one and a half day. I've been trying to approach this from all kinds of directions (writing the JSON from scratch, modifying exported JSON data from another table, trying the old REST API version). I've been searching the web for any solution. I'm about to give up. Anyone else ideas?
Created ‎01-28-2019 03:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Marcel-Jan Krijgsman Do run the /usr/hdp/current/atlas-server/hook-bin/import-hive.sh utility which imports the existing hive tables into atlas.
Created ‎01-28-2019 04:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, @Sandeep Nemuri. I'll try that.
Created ‎01-28-2019 04:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So frustrating indeed have you tried running the hive import from /usr/hdp/2.6.5.0-292/atlas/hook-bin ? The output should look like below
# ./import-hive.sh Using Hive configuration directory [/etc/hive/conf] Log file for import is /usr/hdp/current/atlas-server/logs/import-hive.log log4j:WARN No such property [maxFileSize] in org.apache.log4j.PatternLayout. log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.PatternLayout. Enter username for atlas :- admin Enter password for atlas :- Hive Meta Data imported successfully!!!
After running successfully you should be able to see your tables in Atlas
Created ‎01-28-2019 04:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Seems to be the same script which i mentioned above. Isn't it?
Created ‎02-01-2019 03:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think we responded at almost the same time, when some is clicking submit, there is no logic that checks whether a similar answer has already been give 🙂 Maybe you should have added that he needs to run the script as Atlas admin user as illustrated which he wasn't aware of 🙂
Created ‎01-28-2019 08:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎01-29-2019 08:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the responses everyone. I saw I needed to have the Atlas admin account to run this script. I don't have it, but I've asked the people who do (a company that maintains the Hadoop cluster for us) to run it for me.
While we're waiting, it might be interesting to share the one thing that I did get working. So I put this in the JSON:
{ "entity": { "typeName": "hive_table", "attributes": { "description": null, "name": "tablename", "owner": "degierf", "qualifiedName": "test.tablename@clustername", "replicatedFrom": [], "replicatedTo": [], "aliases": [], "columns": [], "comment": null, "createTime": 1494949241, "db": { "guid": "557c073c-da51-461c-8bba-3594e004db63", "typeName": "hive_db" }, "lastAccessTime": 1494949241, "partitionKeys": [], "retention": null, "sd": null, "tableType": null, "temporary": null, "viewExpandedText": null, "viewOriginalText": null }, "guid": -1 }, "referredEntities": {} }
Then I ran it with the entity endpoint:
curl -u myaccount -i -H "Content-Type: application/json" -X POST -d @upload_hive_table_def.json https://atlasnode:21000/api/atlas/v2/entity
What you get is that Atlas but wasn't the desired result : I managed to get Atlas to recognise there's a Hive table. Just not the columns, references, etc.. I was hoping to add them one by one after this, but that didn't work.
Created ‎01-31-2019 05:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
import-hive.sh was indeed the solution.
We only got it working when admin had read/execute access in Ranger to protected data in HDFS.
Now all Hive tables have been discovered.
