Created on 05-05-2018 12:03 AM
While automating setup of Hortoniabank demo, we needed to automate the task of associating Atlas tags to HDP entities like HDFS, Hive, HBase, Kafka using the names of entities (rather than their guids in Atlas). One option is to use Atlas APIs to find the entity you are looking for using qualifiedName attribute and then use the guid to associates tag to it.
For components like Hive that already have Atlas hook, the Atlas entities for Hive tables will automatically be created when the table is created. For these, we have just provided the API calls to associate the tags with the entity.
For others like Kafka, HDFS, Hbase etc that do not have an Atlas hook (as of HDP 2.6.x), you will need to create the entity first. For these, we have provided both the API call to create the entity and the call to associate the tags with the entity.
The below code examples assume the tags have already been created. these can be created either manually via Atlas UI or using the API. Here is a sample Atlas API call to create a basic tag called TEST that does not have any attributes.
${atlas_curl} ${atlas_url}/types \ -X POST -H 'Content-Type: application/json' \ --data-binary '{"enumTypes":[],"structTypes":[],"traitTypes":[{"superTypes":[],"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType","typeName":"TEST","typeDescription":"TEST","typeVersion":"1.0","attributeDefinitions":[]}],"classTypes":[]}'
All the examples operate the same way: find the guid of the entity you are looking for using qualifiedName attribute and then use the guid to associates tag to it.
First we setup common vars:
atlas_host="atlas.domain.com" cluster_name="datalake" atlas_curl="curl -u admin:admin" atlas_url="http://${atlas_host}:21000/api/atlas"
Example 1: Associate tag REFERENCE_DATA (w/o attributes) to Hive table hortoniabank.eu_countries
#fetch guid for table hortoniabank.eu_countries@${cluster_name} guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=hortoniabank.eu_countries@${cluster_name} | jq '.entity.guid' | tr -d '"') #add REFERENCE_DATA tag ${atlas_curl} ${atlas_url}/entities/${guid}/traits \ -X POST -H 'Content-Type: application/json' \ --data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"REFERENCE_DATA","values":{}}'
Example 2: Associate tag DATA_QUALITY (with attribute: score and value: 0.51) to Hive table cost_savings.claim_savings
#fetch guid for table cost_savings.claim_savings@${cluster_name} guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=cost_savings.claim_savings@${cluster_name} | jq '.entity.guid' | tr -d '"') #add DATA_QUALITY tag with score=0.51 ${atlas_curl} ${atlas_url}/entities/${guid}/traits \ -X POST -H 'Content-Type: application/json' \ --data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"DATA_QUALITY", "values":{"score": "0.51"}}'
Example 3: Associate tag FINANCE_PII (with attribute: type and value:finance) to Hive column finance.tax_2015.ssn
#fetch guid for finance.tax_2015.ssn guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_column?attr:qualifiedName=finance.tax_2015.ssn@${cluster_name} | jq '.entity.guid' | tr -d '"') #add FINANCE_PII tag with type=finance ${atlas_curl} ${atlas_url}/entities/${guid}/traits \ -X POST -H 'Content-Type: application/json' \ --data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"FINANCE_PII", "values":{"type": "finance"}}'
Example 4: Create entity for kafka topic PRIVATE and associate with tag SENSITIVE
#create entities for kafka topics PRIVATE and associate with SENSITIVE tag ${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF { "entity":{ "typeName":"kafka_topic", "attributes":{ "description":null, "name":"PRIVATE", "owner":null, "qualifiedName":"PRIVATE@${cluster_name}", "topic":"PRIVATE", "uri":"none" }, "guid":-1 }, "referredEntities":{ }} EOF guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/kafka_topic?attr:qualifiedName=PRIVATE@${cluster_name} | jq '.entity.guid' | tr -d '"') ${atlas_curl} ${atlas_url}/entities/${guid}/traits \ -X POST -H 'Content-Type: application/json' \ --data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"SENSITIVE","values":{}}'
Example 5: create entities for Hbase table T_PRIVATE and associate with SENSITIVE tag
#create entities for Hbase table T_PRIVATE and associate with SENSITIVE tag ${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF { "entity":{ "typeName":"hbase_table", "attributes":{ "description":"T_PRIVATE table", "name":"T_PRIVATE", "owner":"hbase", "qualifiedName":"T_PRIVATE@${cluster_name}", "column_families":[ ], "uri":"none" }, "guid":-1 }, "referredEntities":{ }} EOF guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hbase_table?attr:qualifiedName=T_PRIVATE@${cluster_name} | jq '.entity.guid' | tr -d '"') ${atlas_curl} ${atlas_url}/entities/${guid}/traits \ -X POST -H 'Content-Type: application/json' \ --data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"SENSITIVE","values":{}}'
Example 6: create entities for HDFS path /banking and associate with BANKING tag
#create entities for HDFS path /banking and associate with BANKING tag hdfs_prefix="hdfs://$(hostname -f):8020" hdfs_path="/banking" ${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF { "entity":{ "typeName":"hdfs_path", "attributes":{ "description":null, "name":"${hdfs_path}", "owner":null, "qualifiedName":"${hdfs_prefix}${hdfs_path}", "clusterName":"${cluster_name}", "path":"${hdfs_prefix}${hdfs_path}" }, "guid":-1 }, "referredEntities":{ }} EOF guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hdfs_path?attr:qualifiedName=${hdfs_prefix}${hdfs_path} | jq '.entity.guid' | tr -d '"') ${atlas_curl} ${atlas_url}/entities/${guid}/traits \ -X POST -H 'Content-Type: application/json' \ --data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"BANKING","values":{}}'