Support Questions

Find answers, ask questions, and share your expertise

Why doesn't Atlas draw lineage?

avatar
Expert Contributor

Hi

I have the following type defined in atlas. Notice that it extends both Dataset and Process. You can use this URL to post this entity using Postman:

http://localhost:21000/api/atlas/v2/types/typedefs

{
"stuctDefs":[],
"classificationDefs": [],
"entityDefs" : [ {
	"name": "spark_testStage",
    "superTypes" : ["Process",
    	"DataSet"],
    "attributeDefs" : [ 
    	{
	      "name" : "test",
	      "typeName": "string",
	      "isOptional" : true,
	      "cardinality": "SINGLE",
	      "isIndexable": false,
	      "isUnique": false
    	},
    	{
    		
                    "name": "description",
                    "typeName": "string",
                    "cardinality": "SINGLE",
                    "isIndexable": true,
                    "isOptional": true,
                    "isUnique": false
                }  ]
}]
}

Now i create the following entities using this URL

http://localhost:21000/api/atlas/v2/entity

{
    "referredEntities": 
	{
        "-208942807557405": {
            "typeName": "spark_testStage",
            "attributes": {
                "owner": "spark",
                "qualifiedName": "Stage6@clusterName",
                "name": "Stage6",
                "description": "this is attribute is inclued due to inheritance"
            },
            "guid": "-208942807557405",
            "version": 0,
			"inputs":
				[
    				{
			    		"typeName": "spark_testStage",
			        	"attributes":
			        	{
					          "source": "testing",
					          "description": null,
					          "qualifiedName": "Stage5@clusterName",
					          "name": "Stage5",
					          "owner": null,
					          "destination": "hdfs://vimal-fenton-4-1.openstacklocal:80ion"	
    					},
    				"guid": "-208942807557404"
    					
    				}
    			],
            "classifications": []
        }
    },
    "entity": 
		{
		"guid": "-208942807557404",
     	"status": "ACTIVE",
     	"version": 0,
      	"typeName": "spark_testStage",
	    "attributes" :
		{


			 "qualifiedName" : "Stage5@clusterName",
			 "name" : "Stage5",
			 "test" : "this is source",
    		 "description" : "source",    			
    		 "outputs":
    			[
    				{
			    		"typeName": "spark_testStage",
			        	"attributes":
			        	{
					          "source": "testing",
					          "description": null,
					          "qualifiedName": "Stage6@clusterName",
					          "name": "Stage6",
					          "owner": null,
					          "destination": "hdfs://vimal-fenton-4-1.openstacklocal:8020/apps/hive/warehouse/destination"	
    					},
    				"guid": "-208942807557405"
    					
    				}
    			]
    				
		  	
		  },
		  "classifications": []
 }
}


Why doesnt atlas define a line between the two entities. The request is successful but i dont see any lineage. Also I see that stage5 has stage 6 in its output and is correctly linked. I can click on this link on to stage 6 but stage 6 does not have stage 5 as its input.

1 ACCEPTED SOLUTION

avatar
Guru

@Arsalan Siddiqi

First, don't inherit from both Process and Dataset for all entities. Define two types, one called Dataframe (inherit from) and the other called SparkProcess (inherit from Process). Create two entities from the Dataframe type and one entity from the SparkProcess type. Remember, you are defining datasets and processes that create them from other datasets. So the Stage type is more like a process and the state of the data (the data structure), is where lineage applies (driven by processing that happens int he stages).

The main issue is most likely that Atlas will only draw lineage if it sees that a Process entity has a reference to at least one entity in the input array and at least one entity in the output array. So when you create the SparkProcess entity, put the reference to Source Dataframe entity in the input array and the reference to the Destination Dataframe in the output array. If all goes well, you should be able to click on either Dataframe entity and see lineage. The Process entity will not show lineage since it essentially acts as glue in the creation of lineage instead of being a dependent. But you should see links to each of the Dataframe entities in the input and output attributes of the process entity.

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

avatar
Guru

@Arsalan Siddiqi

First, don't inherit from both Process and Dataset for all entities. Define two types, one called Dataframe (inherit from) and the other called SparkProcess (inherit from Process). Create two entities from the Dataframe type and one entity from the SparkProcess type. Remember, you are defining datasets and processes that create them from other datasets. So the Stage type is more like a process and the state of the data (the data structure), is where lineage applies (driven by processing that happens int he stages).

The main issue is most likely that Atlas will only draw lineage if it sees that a Process entity has a reference to at least one entity in the input array and at least one entity in the output array. So when you create the SparkProcess entity, put the reference to Source Dataframe entity in the input array and the reference to the Destination Dataframe in the output array. If all goes well, you should be able to click on either Dataframe entity and see lineage. The Process entity will not show lineage since it essentially acts as glue in the creation of lineage instead of being a dependent. But you should see links to each of the Dataframe entities in the input and output attributes of the process entity.

avatar
Expert Contributor

@Arsalan Siddiqi I agree with @Vadim Vaks 's suggestion on structuring the types.

I will attempt to try out your JSON's today and try to see if I can get a better insight on the lineage behavior with the setup.