- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on
11-17-2021
01:50 PM
- edited on
11-18-2021
12:28 AM
by
subratadas
This article will show you how to interact with Atlas APIs in CDP-public to create tags and associate tags with entities (in preparation for use with Ranger's tag based policies)
In Cloudera CDP-public offering, Apache Atlas is a part of SDX DataLake cluster that is created when you create your first Environment:
Pre-requisites
A. First, you will need to find the Atlas endpoint using the Cloudera CDP management console:
Sample Atlas endpoint: https://pse-722-cdp-xxxxx.cloudera.site/pse-722-cdp-dl/cdp-proxy-api/atlas/api/atlas/
B. Next, you will need to set your user's workload password
Now you can use the following sample bash code to interact with Atlas APIs from a CentOS instance outside CDP:
From Atlas endpoint, you can extract the first 2 params below. You will also need to set your username and password:
export datalake_name='pse-722-cdp-dl'
export lake_ip='pse-722-cdp-xxxxx.cloudera.site'
export user='abajwa'
export password='nicepassword'
export atlas_curl="curl -k -u ${user}:${password}"
export atlas_url="https://${lake_ip}:443/${datalake_name}/cdp-proxy-api/atlas/api/atlas"
After forming the above variables, you can use them to run some basic GET and POST commands to import tags and glossary into Atlas.
#test API by fetching Atlas typedefs
${atlas_curl} ${atlas_url}/v2/types/typedefs
#download sample Glossary
wget https://github.com/abajwa-hw/masterclass/blob/master/ranger-atlas/HortoniaMunichSetup/data/export-glossary.zip
#import sample Glossary into Atlas
curl -v -k -X POST -u ${user}:${password} -H "Accept: application/json" -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F data=@export-glossary.zip ${atlas_url}/import
#import sample tags
wget https://github.com/abajwa-hw/masterclass/raw/master/ranger-atlas/HortoniaMunichSetup/data/classifications.json
#import sample tags into Atlas
curl -v -k -X POST -u ${user}:${password} -H "Accept: application/json" -H "Content-Type: application/json" ${atlas_url}/v2/types/typedefs -d @classifications.json
At this point, you should be able to see the newly imported tags and glossary entities in your Atlas UI.
Next, you can search for any Hive entity (this should get automatically created in Atlas when the Hive table is created) and associate it with a tag.
#find airlines_new_orc.airports entity in Atlas
${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm
#fetch guid for airlines_new_orc.airports
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm | jq '.entity.guid' | tr -d '"')
#use guid to associate a tag REFERENCE_DATA to airlines_new_orc.airports entity
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"REFERENCE_DATA","values":{}}'
#confirm now entity shows REFERENCE_DATA tag (also will be visible via UI)
${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm | grep REFERENCE_DATA
Now that you have entities tagged with a tag, you can use Ranger to create a "tag-based policy".
Tag-based Services and Policies
Other sample code to associate tags
Atlas: How to automate associating tags/classifications to HDFS/Hive/HBase/Kafka entities using REST...