Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Rising Star

Overview

Data Governance is unique for each organization and every organization needs to track a different set of properties for their data assets. Fortunately, Atlas provides the flexibility to add new data asset properties to support your organization’s data governance requirements. The objective for this article is to describe the steps utilizing the Atlas REST API to add new Atlas properties to your Atlas Types.

Add a new Property for an existing Atlas Type

To simplify this article, we will focus in on the 3 steps required to add and enable for display a custom property to the standard Atlas property ‘hive_table’. Following these steps, you should be able to modify the ‘hive_table’ Atlas Type and add custom properties which are available to enter values, view in the Atlas UI and search.

To make the article easier to read the JSON file is shown in small chunks. To view the full JSON file as well as other files used to research for this article, check out this repo.

Step 1: Define the custom property JSON

The most import step of this process is properly defining the JSON used to update your Atlas Type. There are three parts to the JSON object we will pass to Atlas;

  • The header – contains the type identifier and some other meta information required by Atlas
  • The actual new property definition
  • The required existing Atlas type properties

Defining the Header

Frankly, the header is just standard JSON elements which get repeated every time you define a new property. The only change we need to make to the header block shown below for each example is to get the ‘typeName’ JSON element properly set. In our case as shown below we want to add a property defined for all Hive tables so we have correctly defined the typeName to be ‘hive_table’.

{"enumTypes": [],"structTypes": [],"traitTypes": [],"classTypes": [
  {"superTypes": ["DataSet"],"hierarchicalMetaTypeName": "org.apache.atlas.typesystem.types.ClassType","typeName": "hive_table","typeDescription": null,

Keep in mind that all the JSON elements shown above pertain to the Atlas type which we plan to modify.

Define the new Atlas Property

For this example, we are adding a property called ‘DataOwner’ which we intend to contain the owner of the data from a governance perspective. For our purposes, we have the following requirements:

Requirement Attribute Property Assignment
The property is searchable isIndexable True
The property will contain a string datatype String
Not all Hive tables will have an owner Multiplicity Optional
A Data owner can be assigned to multiple Hive tables isUnique false

Based on the above requirements, we end up with a property definition as shown below:

{"name": "DataOwner","dataTypeName": "string","multiplicity": "optional","isComposite": false,"isUnique": false,"isIndexable": true,"reverseAttributeName": null},

When defining Atlas properties, you can as shown in the file, it is possible to define multiple properties at one time, so take your time and try and define all of the properties at once.

Make certain you include an existing Properties

An annoying thing about the Atlas v1 REST api is the need to include some of the other key properties in your JSON file. For this example, which was running on HDP 2.5.3 I had to define a bunch of properties. And every time you add a new custom property it is necessary to include those custom properties in your JSON. If you check out the file JSON file used for this example you will find a long list of properties which are required as of HDP 2.5.0.

Step 2: PUT the Atlas property update

We now have the full JSON request constructed with our new property requirements. So it is time to PUT the JSON file using the ATLAS REST API v1. For the text of this article I am using ‘curl’ to make the example clearer, though for the associated repo python is used to make life a little easier.

To execute the PUT REST request we will first need to collect the following data elements:

Data Element Where to find it
Atlas Admin User Id This is a defined ‘administrative’ user for the Atlas System. It is the same user id which you use to log into Atlas.
Atlas Password The password associated with Atlas Admin User Id
Atlas Server The Atlas Metadata Server. This can be found by selecting the Atlas server from Ambari and then looking in the summary tab.
Atlas Port It is normally 21000. Check the Ambari Atlas configs for the specific port in your cluster
Update_hive_table_type.json This is the name of the JSON file containing our new Atlas property definition
curl  -ivH -d @update_hive_table_type.json
--header "Content-Type: application/json" -u {Atlas Admin User Id}:{Atlas Password} -X PUT http://{Atlas Server}:{Atlas Port}/api/atlas/types

If all is successful, then we should see a result like that which is shown below. The only thing you will need to verify in the result (other than the lack of any reported errors) is that then “name” element is the same as the Atlas type to which you are adding a new custom property.

{ 
"requestId": "qtp1177377518-235-fcf1c6f4-5993-49ac-8f5b-cdaafd01f2c0",
   "types":
	[  { 
		"name": "hive_table"  

}  ]}

However, if you are like me, then you probably will make a couple of mistakes along the way. To help you identify root cause for your errors, here is a short list of errors and how to resolve them:

Error #1: Missing a necessary Atlas property for the Type

An error encountered like shown below is because your JSON with the new custom property is missing an existing property.

{

"error": "hive_table can't be updated - Old Attribute stats:numRows is missing",

"stackTrace": "org.apache.atlas.typesystem.types.TypeUpdateException: hive_table can't be updated - Old Attribute stats:numRows is missing\n\tat

The solution to fix this problem is to add that property along with your custom property in your JSON file. If you are uncertain as to the exact definition for the property, then execute the execute Atlas REST API GET call as shown below to list out the Atlas Type you are currently modifying properties:

curl -H –u {Atlas Admin User id}:{Atlas password}-X GET http://{Atlas Server}/api/atlas/types

Error #2: Unknown datatype:

An error occurred like the one below:

{

"error": "Unknown datatype: XRAY",

"stackTrace": "org.apache.atlas.typesystem.exception.TypeNotFoundException: Unknown

In this case, you have entered an incorrect Atlas Data Type. The allowed for data types include:

  • byte
  • short
  • int
  • long
  • float
  • double
  • biginteger
  • bigdecimal
  • date
  • string
  • {custom types}

The {custom types} enables you to reference another Atlas type. So for example you decide to create a ‘SecurityRules’ Atlas data type which itself contains a list of properties, you would just insert the SecurityRules type name as the property.

Error #n: Added incorrectly a new Atlas property for a type and you need to delete it

This is the reason why you ALWAYS want to modify Atlas Types and Properties in a Sandbox developer region. DO NOT EXPERIMENT WITH CUSTOMING ATLAS TYPES IN PRODUCTION!!!!! If you ignore this standard approach in most organizations SLDC, your solution is to delete the Atlas Service from within Ambari, re-add the service and then re-add all your data. Not fun.

Step 3: Check out the results

As we see above, our new custom Atlas ‘hive_table’ property is now visible in the Atlas UI for all tables. As the property was just defined for all ‘hive_table’ data assets the value is null. Your next step which is covered in the Article Modify Atlas Entity properties using REST API commands is to assign a value the new property.

Bibliography

4,871 Views
0 Kudos