<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Atlas REST API: get the guid of a Hive column in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Atlas-REST-API-get-the-guid-of-a-Hive-column/m-p/243045#M204844</link>
    <description>&lt;P&gt;This is now the winning REST API query:&lt;/P&gt;&lt;PRE&gt;curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_table+where+qualifiedName%3D%27testdb.mytable.id@CLUSTERNAME%27"&lt;/PRE&gt;&lt;P&gt;It gives a list of all columns for a table, including deleted ones. In my Python code I pick the column with ACTIVE state.&lt;/P&gt;</description>
    <pubDate>Thu, 07 Feb 2019 22:28:15 GMT</pubDate>
    <dc:creator>marcel-jan</dc:creator>
    <dc:date>2019-02-07T22:28:15Z</dc:date>
    <item>
      <title>Atlas REST API: get the guid of a Hive column</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Atlas-REST-API-get-the-guid-of-a-Hive-column/m-p/243043#M204842</link>
      <description>&lt;P&gt;I want to be able to tag a list of Hive columns in Atlas (HDP 2.6.5) from a Python script. For this I need to find the guid of the Hive column I'm targetting. But I can't seem to tame the Atlas REST API to do that.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Getting all the Hive columns in the data lake is easy enough:&lt;/P&gt;&lt;PRE&gt;curl -u myaccount -i -H "Content-Type: application/json" -X GET 'https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column+where+__state=%27ACTIVE%27+select+qualifiedName,name,__guid'&lt;/PRE&gt;&lt;P&gt;That works (the number of Hive columns is limited to 100 by default):&lt;/P&gt;&lt;PRE&gt;{
  "requestId": "pool-2-thread-6 - 7e8ec2ee-a94b-4e37-b851-57b1b3df6e29",
  "query": "hive_column where __state='ACTIVE' select qualifiedName,name,__guid",
  "queryType": "dsl",
  "count": 100,
  "results": [
    {
      "$typeName$": "__tempQueryResultStruct5",
      "qualifiedName": "mytestdb.employee.name@CLUSTERNAME",
      "__guid": "809c84d5-e065-45f4-81c8-fcdb7cf81560",
      "name": "name"
    },
    {
      "$typeName$": "__tempQueryResultStruct5",
      "qualifiedName": "mytestdb.othertest.id@CLUSTERNAME",
      "__guid": "04cd03fb-31d3-43e0-a75a-f35f2a60ad22",
      "name": "id"
    },
etcetera..
 }
}&lt;/PRE&gt;&lt;P&gt;Now let's pick just one hive_column:&lt;/P&gt;&lt;PRE&gt;curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column+where+name='id'"
&lt;/PRE&gt;&lt;P&gt;Which results in a 500 server error.&lt;/P&gt;&lt;P&gt;Really? Okay, let's try selecting on the qualifiedName of the column:&lt;/P&gt;&lt;PRE&gt;curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column+where+qualifiedName='mytestdb.othertest.id@CLUSTERNAME'"&lt;/PRE&gt;&lt;P&gt;Don't think so: HTTP ERROR 500.&lt;/P&gt;&lt;P&gt;Okay, wise guy. How about I query the table, but then really ask for the qualifiedName of the column?&lt;/P&gt;&lt;PRE&gt;curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_table+where+qualifiedName='mytestdb.othertest.id@CLUSTERNAME'"&lt;/PRE&gt;&lt;P&gt;That works, but I don't get any data on the column.&lt;/P&gt;&lt;P&gt;Okay, let's have a look at the fields in the output. Let's ask for all the Hive columns again, but don't put in the select.&lt;/P&gt;&lt;PRE&gt;curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column&amp;amp;__state=%27ACTIVE%27"&lt;/PRE&gt;&lt;P&gt;I get results, but mixed in there are deleted states. For example:&lt;/P&gt;&lt;PRE&gt;    {
      "$typeName$": "hive_column",
      "$id$": {
        "id": "b3cc1db6-9c95-499c-9f5f-c457a4e4add6",
        "$typeName$": "hive_column",
        "version": 0,
        "state": "DELETED"
      },
      "$systemAttributes$": {
        "createdBy": "myaccount",
        "modifiedBy": "myaccount",
        "createdTime": "Wed Jun 27 15:07:24 CEST 2018",
        "modifiedTime": "Mon Aug 06 16:30:30 CEST 2018"
      },
      "comment": null,
      "qualifiedName": "mytestdb.othertest.id@CLUSTERNAME",
      "type": "string",
      "position": 30,
      "owner": "myaccount",
      "description": null,
      "name": "id",
      "table": {
        "id": "161b4ef8-d5de-4e38-a76d-c030a71b730c",
        "$typeName$": "hive_table",
        "version": 0,
        "state": "DELETED"
      }
    },&lt;/PRE&gt;&lt;P&gt; Okay, let's remove DELETED states in all the fields.&lt;/P&gt;&lt;PRE&gt;curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_column&amp;amp;results.table.state=%27ACTIVE%27&amp;amp;__state=%27ACTIVE%27&amp;amp;__id.state=%27ACTIVE%27"&lt;/PRE&gt;&lt;P&gt;Nice try, but here are the DELETED states again.&lt;/P&gt;&lt;P&gt;I feel like I'm playing a chess match agains the Atlas REST API and I'm not winning. Is there any way to do a query against the Atlas REST API to just get the qualifiedName and guid of one Hive column?&lt;/P&gt;</description>
      <pubDate>Wed, 30 Jan 2019 00:08:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Atlas-REST-API-get-the-guid-of-a-Hive-column/m-p/243043#M204842</guid>
      <dc:creator>marcel-jan</dc:creator>
      <dc:date>2019-01-30T00:08:28Z</dc:date>
    </item>
    <item>
      <title>Re: Atlas REST API: get the guid of a Hive column</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Atlas-REST-API-get-the-guid-of-a-Hive-column/m-p/243044#M204843</link>
      <description>&lt;P&gt;I've been another half day at it. I come to the conclusion that the best that I can get is a list of all columns in a table, including ones with a DELETED state.&lt;/P&gt;</description>
      <pubDate>Wed, 06 Feb 2019 23:55:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Atlas-REST-API-get-the-guid-of-a-Hive-column/m-p/243044#M204843</guid>
      <dc:creator>marcel-jan</dc:creator>
      <dc:date>2019-02-06T23:55:14Z</dc:date>
    </item>
    <item>
      <title>Re: Atlas REST API: get the guid of a Hive column</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Atlas-REST-API-get-the-guid-of-a-Hive-column/m-p/243045#M204844</link>
      <description>&lt;P&gt;This is now the winning REST API query:&lt;/P&gt;&lt;PRE&gt;curl -u myaccount -i -H "Content-Type: application/json" -X GET "https://atlasnode.domain.com:21000/api/atlas/discovery/search/dsl?query=hive_table+where+qualifiedName%3D%27testdb.mytable.id@CLUSTERNAME%27"&lt;/PRE&gt;&lt;P&gt;It gives a list of all columns for a table, including deleted ones. In my Python code I pick the column with ACTIVE state.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Feb 2019 22:28:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Atlas-REST-API-get-the-guid-of-a-Hive-column/m-p/243045#M204844</guid>
      <dc:creator>marcel-jan</dc:creator>
      <dc:date>2019-02-07T22:28:15Z</dc:date>
    </item>
  </channel>
</rss>

