Support Questions

Find answers, ask questions, and share your expertise

Apache Atlas Tracking Lineage Not working as Expected

avatar
New Contributor

In Apache Atlas, I am trying to model the data flow of different processes. The issue I am having is that some of these processes share common DataSets but I don't necessarily want the different processes I am modeling to appear to be connected to each other.

For example, in this lineage model, I want to show that there is an input of an XML Data source file into a process that outputs and transferred to another computer.

{
"entity": {
"typeName": "datasystem_datatransfer",
"attributes": {
"id":"b75af137-9279-4c73-be9f-0e37b686dde5",
"qualifiedName": "b75af137-9279-4c73-be9f-0e37b686dde5@datasystem_datatransfer",
"displayName": "Data Transfer Use Case 1",
"inputs": [
{
"uniqueAttributes":{"qualifiedName": "25b60fe5-891c-4c94-87ab-b075d838ec30@datasystem_datasource"},
"typeName": "datasystem_datasource"
}
],
"outputs": [
{
"uniqueAttributes":{"qualifiedName": "21781e1b-4b94-435b-be0a-141776267c4e@datasystem_computer"},
"typeName": "datasystem_computer"
}
],
"description": "Data transfer from Data Source to Computer.",
"name": "dataEgressUseCase1"
}
}
}
This will create a model like this:

 

datasystem_datasource --> datasystem_datatransfer --> datasystem_computer

 

I now have another process I want to model where I am using the same "datasystem_computer" but the process is a bit more complicated:

{
"entities":[
{
"typeName": "datasystem_datatransfer",
"attributes": {
"id":"1305f6c4-f0da-4929-be21-dd0798dc2086",
"qualifiedName": "1305f6c4-f0da-4929-be21-dd0798dc2086@datasystem_datatransfer",
"displayName": "Data Transfer Use Case 2",
"inputs": [
{
"uniqueAttributes":{"qualifiedName": "c72375fb-34a5-4a22-895c-0d55435fdf26@datasystem_datasource "},
"typeName": "datasystem_datasource"
}
],
"outputs": [
{
"uniqueAttributes":{"qualifiedName": "21781e1b-4b94-435b-be0a-141776267c4e@datasystem_computer"},
"typeName": "datasystem_computer"
}
],
"description": "Data Transfer from Data Source to PC.",
"name": "dataEgressUseCase2"
}
},
{
"typeName": "datasystem_datatransfer",
"attributes": {
"id":"307e6f84-41af-482e-8641-39fa258e709d",
"qualifiedName": "307e6f84-41af-482e-8641-39fa258e709d@datasystem_datatransfer",
"displayName": "Data Transfer Use Case 2.5",
"inputs": [
{
"uniqueAttributes":{"qualifiedName": "21781e1b-4b94-435b-be0a-141776267c4e@datasystem_computer"},
"typeName": "datasystem_computer"
}
],
"outputs": [
{
"uniqueAttributes":{"qualifiedName": "5acddaca-6eb8-48f9-be75-fc757e442985@datasystem_datasource"},
"typeName": "datasystem_datasource"
}
],
"description": "Data Transfer from Data Source to PC to Another PC.",
"name": "dataEgressUseCase2.5"
}
}

]
}
This should create a lineage diagram like:

 

datasystem_datasource --> datasystem_datatransfer --> datasystem_computer --> datasystem_datatransfer -->datasystem_datasource 

 

The problem is that when I create this lineage, it changes the first lineage I created. They have different ID's so I am not sure why creating this second lineage would impact the first? I realize that they share the same datasystem_computer in one node, but they are different processes. What am I doing wrong?

1 REPLY 1

avatar
Expert Contributor

Hello @DreamDelerium Thanks for the sharing this question with us as I checked both both datasource both datasource lineage data should be same because the datasystem_datatransfer part of datasystem_datasource so the origin will be same for lineage data now I comes with you first question why creating this second lineage would impact the first? Not it couldn't impact.

Please let me know if you required any clarification for your above question