Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Rising Star

Overview

This article reviews the steps necessary to update Hive entities within Atlas the Description and Comment fields. The 0.70 Atlas release will display and allow text searches on the ‘‘description’ field, but the Atlas UI does not at this time support the ability to manually enter those properties into a given data Asset.

Examined in this article includes:

  • Searching for a Hive_Table entity
  • Update a single property for the Hive_Table entity definition (“description”)

The Problem:

In release 0.70, Atlas has the ability to monitor additions as well as changes to Hive table and Hive columns. When Atlas identifies a new entry or change the appropriate Metadata property is updated for that entity. One very cool aspect to Atlas is the ability to conduct either DSL or free text searches on any properties set for the entity. Anyone trying to identify datasets to support a specific analytic activity will definitely appreciate the ability search through all of the entities and quickly discover valuable data assets in the data lake without having to relying on tribal knowledge.

For this Article we will update a specific table based on its full qualified name and then assign a new description field to the table. The full source code for the examples covered in this article on GitHub. The code for this example is written in Python and there is a full set of instructions in the repository README.md file.

Locating the Entity whose properties require updating

Now let’s assume that in our ‘HDP’ cluster within the ‘default’ database there exists a table named ‘drivers’. For this table, our objective is to change the ‘description’ property from its current value to a value of ‘I get my answers from HCC’. Entity property updates are made one at a time, so our first step is to collect the Guid for our target table.

As this article is about the update of a property within an Hive_table Entity, we will limit the search coverage to identifying a unique Hive_table. The query values for this example are:

Property Value used in this article Comments on how to change the provided values for your cluster.
Atas server FQDN server1.hdp Use your server's Atlas Metadataserver FQDN
entityType hive_table Can be any valid Atlas Type
database name default Specify your table's database name.
table name drivers This can be any Hive Table whose metadata is already in Atlas. The table name you provide must already exist on your specified cluster.
Cluster name HDP The name of your cluster

An Atlas entity can be any variety of types. The beauty of this architecture is the same search steps are available whether seeking a table, a hive column, or some other Atlas managed type. The format we will use for this search example is:

HTTP://{Atlas server FQDN}:21000/api/atlas/entities?type={entitytype}&property=qualifiedName&value={databasename}.{table name}@{Cluster name}

So for our example, the exact REST query would be:

http://server1.hdp:21000/api/atlas/entities?type=hive_table&property=qualifiedName&value=default.dri...

The full result as shown below from this REST query will contain the guid necessary for the update along with all of the hive_table’s metadata information as shown below:

{ 
"definition": { 
"id": { 
"id": "b78b5541-a205-4f9e-8b81-e20632a88ad5",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_table",  
"version": 0  },  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",  
"traitNames": [ 
"TLC"  ],  
"traits": {  "TLC": { 
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",  
"typeName": "TLC",  
"values": {}  }  },  
"typeName": "hive_table",   "values":
{ 
"aliases": null,  
"columns": [  { 
"id": { 
"id": "1690ccc2-d7be-45af-becb-c6b360a1a30f",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_column",  
"version": 0  }, 
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",  
"traitNames": [],  
"traits": {},  
"typeName": "hive_column",  
"values": { 
"comment": null,  
"description": null,  
"name": "driverid",  
"owner": "hive",  
"qualifiedName": "default.drivers.driverid@HDP",    "table": { 
"id": "b78b5541-a205-4f9e-8b81-e20632a88ad5",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_table",  
"version": 0 
},  
"type": "varchar(15)"  }  },   { 
"id": { 
"id": "249a7ce3-6b19-418e-9094-7d8a30bc596f",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",    "typeName":
"hive_column",  
"version": 0  }, 
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",  
"traitNames": [   "CARRIER"  ], 
"traits": { 
"CARRIER": { 
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",  
"typeName": "CARRIER",  
"values": {} 
}  }, 
"typeName": "hive_column",  
"values": { 
"comment": null,  
"description": null,  
"name": "companyid",  
"owner": "hive",  
"qualifiedName": "default.drivers.companyid@HDP",  
"table": { 
"id": "b78b5541-a205-4f9e-8b81-e20632a88ad5",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_table",  
"version": 0 
},  
"type": "varchar(15)"  }  },   { 
"id": { 
"id": "d3b9557a-5ad0-4585-a9af-e1fed24569fc",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_column",  
"version": 0   },  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",  
"traitNames": [],  
"traits": {},  
"typeName": "hive_column",    "values": { 
"comment": null,  
"description": null,  
"name": "customer",  
"owner": "hive",  
"qualifiedName": "default.drivers.customer@HDP",  
"table": { 
"id": "b78b5541-a205-4f9e-8b81-e20632a88ad5",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_table",  
"version": 0 
},  
"type": "varchar(40)"  }  },   { 
"id": { 
"id": "143479a3-be79-4f04-b649-4a09b5429ace",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_column",  
"version": 0  }, 
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",  
"traitNames": [],  
"traits": {},  
"typeName": "hive_column",  
"values": { 
"comment": null,  
"description": null,  
"name": "drivername",  
"owner": "hive",  
"qualifiedName": "default.drivers.drivername@HDP",  
"table": { 
"id": "b78b5541-a205-4f9e-8b81-e20632a88ad5",  
 "jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_table",  
"version": 0 
},  
"type": "varchar(75)"  }  },   { 
"id": { 
"id": "6c3123a9-0d09-490b-840d-6cc012ab69e0",  
"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id", 
"state": "ACTIVE",  
"typeName": "hive_column",  
"version": 0  }, 
"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference", 
"traitNames": [],  
"traits": {},  
"typeName": "hive_column",  
"values": { 
"comment": null,    "description": null, 
"name": "yearsdriving",  
"owner": "hive",  
"qualifiedName": "default.drivers.yearsdriving@HDP", 
"table": {   "id":
"b78b5541-a205-4f9e-8b81-e20632a88ad5",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_table",  
"version": 0 
},  
"type": "int"  }  },   { 
"id": { 
"id": "a419ed9f-df56-41cc-90bc-1c00a4d3c428",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_column",  
"version": 0  }, 
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",  
"traitNames": [],  
"traits": {},    "typeName":
"hive_column",  
"values": { 
"comment": null,  
"description": null,  
"name": "riskscore",  
"owner": "hive",    "qualifiedName":
"default.drivers.riskscore@HDP",  
"table": { 
"id": "b78b5541-a205-4f9e-8b81-e20632a88ad5",  
"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id", 
"state": "ACTIVE",  
"typeName": "hive_table",  
"version": 0 
},  
"type": "varchar(5)"  }  }  ],  
"comment": null,  
"createTime": "2016-10-11T17:11:11.000Z",  
"db": { 
"id": "332189cc-d994-44c2-8f87-29a28a471434",  
"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id", 
"state": "ACTIVE",  
"typeName": "hive_db",  
"version": 0  }, "description":
"\"changeMe\"",  
"lastAccessTime": "2016-10-11T17:11:11.000Z",  
"name": "drivers",  
"owner": "hive",  
"parameters": { 
"COLUMN_STATS_ACCURATE":
"{\"BASIC_STATS\":\"true\"}",  
"EXTERNAL": "TRUE",  
"numFiles": "1",  
"numRows": "4278",  
"rawDataSize": "1967880",  
"totalSize": "68597",  
"transient_lastDdlTime": "1476205880"  },  
"partitionKeys": null,  
"qualifiedName": "default.drivers@HDP",  
"retention": 0,  
"sd": { 
"id": { 
"id": "36166469-1014-4645-98a6-9df34b37a145",  
"jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",  
"state": "ACTIVE",  
"typeName": "hive_storagedesc",  
"version": 0  },    "jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",  
"traitNames": [],  
"traits": {},  
"typeName": "hive_storagedesc",  
"values": { 
"bucketCols": null,  
"compressed": false,  
"inputFormat":
"org.apache.hadoop.hive.ql.io.orc.OrcInputFormat",  
"location":
"hdfs://server1.hdp:8020/apps/hive/warehouse/drivers",  
"numBuckets": -1,  
"outputFormat":
"org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat",  
"parameters": null,  
"qualifiedName": "default.drivers@HDP_storage",  
"serdeInfo": {   "jsonClass":
"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",  
"typeName": "hive_serde",  
"values": { 
"name": null,  
"parameters": { 
"serialization.format": "1" 
},  
"serializationLib":
"org.apache.hadoop.hive.ql.io.orc.OrcSerde" 
}  }, 
"sortCols": null,  
"storedAsSubDirectories": false,  
"table": { 
"id": "b78b5541-a205-4f9e-8b81-e20632a88ad5",  
"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id", 
"state": "ACTIVE",  
"typeName": "hive_table",  
"version": 0  }  }  },  
"tableType": "EXTERNAL_TABLE",  
"temporary": false,  
"viewExpandedText": null,  
"viewOriginalText": null  }  },  
"requestId": "qtp511473681-34831 -
b088be5b-44e6-4a2c-bd4a-7beeb059cf4f"}

In the result set above, locate the "id" property value which is the GUID and the "description" property with the current value of "changeMe".

In this case we will use the REST query results definition.id.id value of ‘b78b5541-a205-4f9e-8b81-e20632a88ad5’ to support our next REST query to update the property value. We can also see in the ‘description’ field which is highlighted in bold currently has the value of “changeMe”.

Updating an Entities Property value

Now that we have the GUID, it is time to update the ‘description’ property from ‘changeMe’ to ‘I get my answers from HCC’.

The update entity property REST command requires the GUID from the prior search step. To update the property, we will use the POST entity Atlas REST Command rolling the url query format and include the string "I get my answers from HCC" in the post message payload:

http://{Atlas
server FQDN}:21000/api/atlas/entities/{GUID from prior search operation}?property={atlas
property field name}

So to finish our example, with our payload containing the string "I get my answers from HCC", the actual query would be:

http://server1:21000/api/atlas/entities/b78b5541-a205-4f9e-8b81-e20632a88ad5?property=description

The result from the above command will be the current Metadata definition for our drivers table in JSON format as shown below:

{… 
"description": "\"I get my answers from
HCC\"",  
"lastAccessTime": "2016-10-11T17:11:11.000Z",  
"name": "drivers",  
"owner": "hive",  
"parameters": { 
"COLUMN_STATS_ACCURATE":
"{\"BASIC_STATS\":\"true\"}",  
"EXTERNAL": "TRUE",  
"numFiles": "1",  
"numRows": "4278",  
"rawDataSize": "1967880",  
"totalSize": "68597",  
"transient_lastDdlTime": "1476205880"}

Now let's go take a look at the Atlas UI, and check on the description for the drivers table. As we see in the screen print below, the new description property value has been successfully changed:

8663-atlas.png

Next Steps:

This article attempts to take a simple property change example to illustrate the techniques necessary to modify the Atlas Metadata for a given entity. After you have completely run through this example, so follow on activities to experiment with include:

  • Changing properties for different entity types such as Hive_column or any of the HBase types.
  • Attempt to change some of the other top level property fields.
  • Go through a list of Hive tables changing each 'description' property with the value from an external source.

Resource Bibliography

7,704 Views
Comments
avatar
New Contributor

@mjohnson Thanks for detailed explanation on updating entities. I have a query in the command you used to updated the description of entity.

The command you used to update the description doesn't contain the actual string that needs to be replaced.

Do we need to add it in the command while executing? something like the below

http://server1:21000/api/atlas/entities/b78b5541-a205-4f9e-8b81-e20632a88ad5?property=description:"I get my answers from HCC"

Thanks