Community Articles

Find and share helpful community-sourced technical articles.
avatar

This article will show you how to interact with Atlas APIs in CDP-public to create tags and associate tags with entities (in preparation for use with Ranger's tag based policies)

 

In Cloudera CDP-public offering, Apache Atlas is a part of SDX DataLake cluster that is created when you create your first Environment:

Introduction to Data Lakes

Pre-requisites

A. First, you will need to find the Atlas endpoint using the Cloudera CDP management console:

Accessing Data Lake services

Sample Atlas endpoint: https://pse-722-cdp-xxxxx.cloudera.site/pse-722-cdp-dl/cdp-proxy-api/atlas/api/atlas/

 

B. Next, you will need to set your user's workload password

Setting the workload password

Now you can use the following sample bash code to interact with Atlas APIs from a CentOS instance outside CDP:

 

From Atlas endpoint, you can extract the first 2 params below. You will also need to set your username and password:

export datalake_name='pse-722-cdp-dl'
export lake_ip='pse-722-cdp-xxxxx.cloudera.site'
export user='abajwa'
export password='nicepassword'

export atlas_curl="curl -k -u ${user}:${password}"
export atlas_url="https://${lake_ip}:443/${datalake_name}/cdp-proxy-api/atlas/api/atlas"

After forming the above variables, you can use them to run some basic GET and POST commands to import tags and glossary into Atlas.

 

 

#test API by fetching Atlas typedefs
${atlas_curl} ${atlas_url}/v2/types/typedefs

#download sample Glossary
wget https://github.com/abajwa-hw/masterclass/blob/master/ranger-atlas/HortoniaMunichSetup/data/export-glossary.zip

#import sample Glossary into Atlas
curl -v -k -X POST -u ${user}:${password} -H "Accept: application/json" -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F data=@export-glossary.zip ${atlas_url}/import

#import sample tags
wget https://github.com/abajwa-hw/masterclass/raw/master/ranger-atlas/HortoniaMunichSetup/data/classifications.json

#import sample tags into Atlas
curl -v -k -X POST -u ${user}:${password} -H "Accept: application/json" -H "Content-Type: application/json" ${atlas_url}/v2/types/typedefs -d @classifications.json

 

 

At this point, you should be able to see the newly imported tags and glossary entities in your Atlas UI.

Next, you can search for any Hive entity (this should get automatically created in Atlas when the Hive table is created) and associate it with a tag.

 

 

#find airlines_new_orc.airports entity in Atlas
${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm

#fetch guid for airlines_new_orc.airports
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm | jq '.entity.guid'  | tr -d '"')

#use guid to associate a tag REFERENCE_DATA to airlines_new_orc.airports entity
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"REFERENCE_DATA","values":{}}'


#confirm now entity shows REFERENCE_DATA tag (also will be visible via UI)
${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm | grep REFERENCE_DATA

 

 

Now that you have entities tagged with a tag, you can use Ranger to create a "tag-based policy".

Tag-based Services and Policies


Other sample code to associate tags
Atlas: How to automate associating tags/classifications to HDFS/Hive/HBase/Kafka entities using REST...

 

2,364 Views