Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to use the function of data classification of atlas? And is it implemented through Tags ?

How to use the function of data classification of atlas? And is it implemented through Tags ?

New Contributor

Question 1:

According the official guide of Atlas, the feature of Data Classification was shown as following:

  • Import or define taxonomy business-oriented annotations for data
  • Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes

I am wondering how to define taxonomy annotations for data? just through the Tags ?

~

Question 2:

How to automate capture the relationships between data?

~

Question 3:

How many ways to add a Tag to data except one way in the Atlas Web UI?

3 REPLIES 3

Re: How to use the function of data classification of atlas? And is it implemented through Tags ?

Guru

@Ethan Hsieh

Here is an example of how you would define a new Type that inherits from an existing type. In this case, the type is a Nifi Flow.

{
	"enumTypes": [],
	"structTypes": [],
	"traitTypes": [],
	"classTypes": [
	{
		"superTypes":["Process"],
		"hierarchicalMetaTypeName": "org.apache.atlas.typesystem.types.ClassType",
		"typeName": "nifi_flow",
		"attributeDefinitions": [
          		{
            			"name": "nodes",
            			"dataTypeName": "string",
            			"multiplicity": "optional",
            			"isComposite": false,
            			"isUnique": false,
            			"isIndexable": true,
            			"reverseAttributeName": null
          		},
			{
				"name": "flow_id",
                                "dataTypeName": "string",
				"multiplicity": "optional",
				"isComposite": false,
				"isUnique": false,
				"isIndexable": true,
				"reverseAttributeName": null
			}
        	]
      	}
   ]
}

Notice that you can create complex inheritance structures that reflect organizational structures and artifacts.

Once you have the type defined, you just need to push that type to Atlas via the REST API

curl -d @type.json -X POST -H "Content-Type: application/json" http://localhost:21000/api/atlas/types

You create Tags (Traits) by using a similar JSON descriptor but defining the types array instead of the classes array. You can now create entities that are based on the new type and posses the new trait (tags).

{
    "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
    "id": {
      "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
      "id": "9be7d4bd-7aa1-4bac-809e-9db706a04262",
      "version": 0,
      "typeName": "nifi_flow"
    },
    "typeName": "nifi_flow",
    "values": {
      "name": "NiFi Flow_7c84501d-d10c-407c-b9f3-1d80e38fe36a_c12a8df2-1a01-4a1e-b841-b83f2941c587_db3a3b5a-9ccd-460c-9e69-cb6b3f8c3ecf",
      "description": "[ListenHTTP:RECEIVE, EvaluateJsonPath:ATTRIBUTES_MODIFIED, UpdateAttribute:ATTRIBUTES_MODIFIED, RouteOnAttribute:ROUTE, PutKafka:SEND, PutKafka:DROP]",
      "outputs": [
        {
          "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id": "db3a3b5a-9ccd-460c-9e69-cb6b3f8c3ecf",
          "version": 0,
          "typeName": "DataSet"
        }
      ],
      "inputs": [
        {
          "jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
          "id": "c12a8df2-1a01-4a1e-b841-b83f2941c587",
          "version": 0,
          "typeName": "DataSet"
        }
      ],
      "nodes": null,
      "flow_id": "7c84501d-d10c-407c-b9f3-1d80e38fe36a"
    },
    "traitNames": [
      
    ],
    "traits": {
      
    }
}

The entity is defined, now just post it to Atlas.

curl -d @entity.json -X POST -H "Content-Type: application/json" http://localhost:21000/api/atlas/entities

You should now be able to see the new nifi_flow entity in Atlas by searching for it. Be defining new types that reflect an organization documents, processes, and artifacts and then creating entities based on those types, you can implement data governance around any organizational structure.

Notice that input and output fields are have entity ids in them. When you define input output fields with entity ids you establish relationships between those entities. The lineage graphs will then reflect that relationship.

You can add new types, entities, and tags via the REST API or the Java API.

Re: How to use the function of data classification of atlas? And is it implemented through Tags ?

New Contributor

Hey @Ethan Hsieh,

Great Example,

In our Org we are thinking of capturing the lineage (end to end), We are using NiFi for the Data Flow and then we are storing data in S3 or HDP based on requirement. Is it possible to create a NiFi flow in Atlas to capture Data lineage of NiFi as well. Awaiting your response.

Thank you,

Subash

Re: How to use the function of data classification of atlas? And is it implemented through Tags ?

New Contributor

I tried to create type in Atlas 0.7, I am not able to create the, Error says "not able to deserialize json"

Don't have an account?
Coming from Hortonworks? Activate your account here