Created on 12-23-2016 05:01 PM
Data Governance is unique for each organization and every organization needs to track a different set of properties for their data assets. Fortunately, Atlas provides the flexibility to add new data asset properties to support your organization’s data governance requirements. The objective for this article is to describe the steps utilizing the Atlas REST API to add new Atlas properties to your Atlas Types.
To simplify this article, we will focus in on the 3 steps required to add and enable for display a custom property to the standard Atlas property ‘hive_table’. Following these steps, you should be able to modify the ‘hive_table’ Atlas Type and add custom properties which are available to enter values, view in the Atlas UI and search.
To make the article easier to read the JSON file is shown in small chunks. To view the full JSON file as well as other files used to research for this article, check out this repo.
The most import step of this process is properly defining the JSON used to update your Atlas Type. There are three parts to the JSON object we will pass to Atlas;
Frankly, the header is just standard JSON elements which get repeated every time you define a new property. The only change we need to make to the header block shown below for each example is to get the ‘typeName’ JSON element properly set. In our case as shown below we want to add a property defined for all Hive tables so we have correctly defined the typeName to be ‘hive_table’.
{"enumTypes": [],"structTypes": [],"traitTypes": [],"classTypes": [ {"superTypes": ["DataSet"],"hierarchicalMetaTypeName": "org.apache.atlas.typesystem.types.ClassType","typeName": "hive_table","typeDescription": null,
Keep in mind that all the JSON elements shown above pertain to the Atlas type which we plan to modify.
For this example, we are adding a property called ‘DataOwner’ which we intend to contain the owner of the data from a governance perspective. For our purposes, we have the following requirements:
Requirement | Attribute Property | Assignment |
The property is searchable | isIndexable | True |
The property will contain a string | datatype | String |
Not all Hive tables will have an owner | Multiplicity | Optional |
A Data owner can be assigned to multiple Hive tables | isUnique | false |
Based on the above requirements, we end up with a property definition as shown below:
{"name": "DataOwner","dataTypeName": "string","multiplicity": "optional","isComposite": false,"isUnique": false,"isIndexable": true,"reverseAttributeName": null},
When defining Atlas properties, you can as shown in the file, it is possible to define multiple properties at one time, so take your time and try and define all of the properties at once.
An annoying thing about the Atlas v1 REST api is the need to include some of the other key properties in your JSON file. For this example, which was running on HDP 2.5.3 I had to define a bunch of properties. And every time you add a new custom property it is necessary to include those custom properties in your JSON. If you check out the file JSON file used for this example you will find a long list of properties which are required as of HDP 2.5.0.
We now have the full JSON request constructed with our new property requirements. So it is time to PUT the JSON file using the ATLAS REST API v1. For the text of this article I am using ‘curl’ to make the example clearer, though for the associated repo python is used to make life a little easier.
To execute the PUT REST request we will first need to collect the following data elements:
Data Element | Where to find it |
Atlas Admin User Id | This is a defined ‘administrative’ user for the Atlas System. It is the same user id which you use to log into Atlas. |
Atlas Password | The password associated with Atlas Admin User Id |
Atlas Server | The Atlas Metadata Server. This can be found by selecting the Atlas server from Ambari and then looking in the summary tab. |
Atlas Port | It is normally 21000. Check the Ambari Atlas configs for the specific port in your cluster |
Update_hive_table_type.json | This is the name of the JSON file containing our new Atlas property definition |
curl -ivH -d @update_hive_table_type.json --header "Content-Type: application/json" -u {Atlas Admin User Id}:{Atlas Password} -X PUT http://{Atlas Server}:{Atlas Port}/api/atlas/types
If all is successful, then we should see a result like that which is shown below. The only thing you will need to verify in the result (other than the lack of any reported errors) is that then “name” element is the same as the Atlas type to which you are adding a new custom property.
{ "requestId": "qtp1177377518-235-fcf1c6f4-5993-49ac-8f5b-cdaafd01f2c0", "types": [ { "name": "hive_table" } ]}
However, if you are like me, then you probably will make a couple of mistakes along the way. To help you identify root cause for your errors, here is a short list of errors and how to resolve them:
An error encountered like shown below is because your JSON with the new custom property is missing an existing property.
{
"error": "hive_table can't be updated - Old Attribute stats:numRows is missing",
"stackTrace": "org.apache.atlas.typesystem.types.TypeUpdateException: hive_table can't be updated - Old Attribute stats:numRows is missing\n\tat
The solution to fix this problem is to add that property along with your custom property in your JSON file. If you are uncertain as to the exact definition for the property, then execute the execute Atlas REST API GET call as shown below to list out the Atlas Type you are currently modifying properties:
curl -H –u {Atlas Admin User id}:{Atlas password}-X GET http://{Atlas Server}/api/atlas/types
Error #2: Unknown datatype:
An error occurred like the one below:
{
"error": "Unknown datatype: XRAY",
"stackTrace": "org.apache.atlas.typesystem.exception.TypeNotFoundException: Unknown
In this case, you have entered an incorrect Atlas Data Type. The allowed for data types include:
The {custom types} enables you to reference another Atlas type. So for example you decide to create a ‘SecurityRules’ Atlas data type which itself contains a list of properties, you would just insert the SecurityRules type name as the property.
This is the reason why you ALWAYS want to modify Atlas Types and Properties in a Sandbox developer region. DO NOT EXPERIMENT WITH CUSTOMING ATLAS TYPES IN PRODUCTION!!!!! If you ignore this standard approach in most organizations SLDC, your solution is to delete the Atlas Service from within Ambari, re-add the service and then re-add all your data. Not fun.
As we see above, our new custom Atlas ‘hive_table’ property is now visible in the Atlas UI for all tables. As the property was just defined for all ‘hive_table’ data assets the value is null. Your next step which is covered in the Article Modify Atlas Entity properties using REST API commands is to assign a value the new property.