Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Atlas REST API: get the guid of a Hive column

Solved Go to solution
Highlighted

Atlas REST API: get the guid of a Hive column

Contributor

I want to be able to tag a list of Hive columns in Atlas (HDP 2.6.5) from a Python script. For this I need to find the guid of the Hive column I'm targetting. But I can't seem to tame the Atlas REST API to do that.


Getting all the Hive columns in the data lake is easy enough:

curl -u myaccount -i -H "Content-Type: application/json" -X GET 'https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column+where+__state=%27ACTIVE%27+select+qualifiedName,name,__guid'

That works (the number of Hive columns is limited to 100 by default):

{
  "requestId": "pool-2-thread-6 - 7e8ec2ee-a94b-4e37-b851-57b1b3df6e29",
  "query": "hive_column where __state='ACTIVE' select qualifiedName,name,__guid",
  "queryType": "dsl",
  "count": 100,
  "results": [
    {
      "$typeName$": "__tempQueryResultStruct5",
      "qualifiedName": "mytestdb.employee.name@CLUSTERNAME",
      "__guid": "809c84d5-e065-45f4-81c8-fcdb7cf81560",
      "name": "name"
    },
    {
      "$typeName$": "__tempQueryResultStruct5",
      "qualifiedName": "mytestdb.othertest.id@CLUSTERNAME",
      "__guid": "04cd03fb-31d3-43e0-a75a-f35f2a60ad22",
      "name": "id"
    },
etcetera..
 }
}

Now let's pick just one hive_column:

curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column+where+name='id'"

Which results in a 500 server error.

Really? Okay, let's try selecting on the qualifiedName of the column:

curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column+where+qualifiedName='mytestdb.othertest.id@CLUSTERNAME'"

Don't think so: HTTP ERROR 500.

Okay, wise guy. How about I query the table, but then really ask for the qualifiedName of the column?

curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_table+where+qualifiedName='mytestdb.othertest.id@CLUSTERNAME'"

That works, but I don't get any data on the column.

Okay, let's have a look at the fields in the output. Let's ask for all the Hive columns again, but don't put in the select.

curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column&__state=%27ACTIVE%27"

I get results, but mixed in there are deleted states. For example:

    {
      "$typeName$": "hive_column",
      "$id$": {
        "id": "b3cc1db6-9c95-499c-9f5f-c457a4e4add6",
        "$typeName$": "hive_column",
        "version": 0,
        "state": "DELETED"
      },
      "$systemAttributes$": {
        "createdBy": "myaccount",
        "modifiedBy": "myaccount",
        "createdTime": "Wed Jun 27 15:07:24 CEST 2018",
        "modifiedTime": "Mon Aug 06 16:30:30 CEST 2018"
      },
      "comment": null,
      "qualifiedName": "mytestdb.othertest.id@CLUSTERNAME",
      "type": "string",
      "position": 30,
      "owner": "myaccount",
      "description": null,
      "name": "id",
      "table": {
        "id": "161b4ef8-d5de-4e38-a76d-c030a71b730c",
        "$typeName$": "hive_table",
        "version": 0,
        "state": "DELETED"
      }
    },

Okay, let's remove DELETED states in all the fields.

curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column&results.table.state=%27ACTIVE%27&__state=%27ACTIVE%27&__id.state=%27ACTIVE%27"

Nice try, but here are the DELETED states again.

I feel like I'm playing a chess match agains the Atlas REST API and I'm not winning. Is there any way to do a query against the Atlas REST API to just get the qualifiedName and guid of one Hive column?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Atlas REST API: get the guid of a Hive column

Contributor

This is now the winning REST API query:

curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_table+where+qualifiedName%3D%27testdb.mytable.id@CLUSTERNAME%27"

It gives a list of all columns for a table, including deleted ones. In my Python code I pick the column with ACTIVE state.

2 REPLIES 2

Re: Atlas REST API: get the guid of a Hive column

Contributor

I've been another half day at it. I come to the conclusion that the best that I can get is a list of all columns in a table, including ones with a DELETED state.

Re: Atlas REST API: get the guid of a Hive column

Contributor

This is now the winning REST API query:

curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_table+where+qualifiedName%3D%27testdb.mytable.id@CLUSTERNAME%27"

It gives a list of all columns for a table, including deleted ones. In my Python code I pick the column with ACTIVE state.

Don't have an account?
Coming from Hortonworks? Activate your account here