Support Questions

Find answers, ask questions, and share your expertise

Creating Lineage with Apache Atlas and multiple Processes

avatar
New Contributor

Hello.  I would like to get more information about how to create Lineage in Apache Atlas.  Specifically, what is the right way to create lineage that includes multiple Processes and Data Sets.  Should I just be creating multiple Entities for the Process (using entity/bulk api endpoint)?  This seems to work, but it does create multiple Entities instead of just one complex entity:

 

{
"entities": [{
"typeName": "mysystem_dataMovement",
"attributes": {
"id": "1305f6c4-f0da-4929-be21-dd0798dc2086",
"qualifiedName": "1305f6c4-f0da-4929-be21-dd0798dc2086@mysystem_dataegress",
"displayName": "Data Egress Use Case 1",
"inputs": [{
"uniqueAttributes": {
"qualifiedName": "c72375fb-34a5-4a22-895c-0d55435fdf26@mysystem_datasource"
},
"typeName": "mysystem_datasource"
}],
"outputs": [{
"uniqueAttributes": {
"qualifiedName": "b8e4ced9-f3f4-451a-8b24-3fa4d7970824@mysystem_computer"
},
"typeName": "mysystem_computer"
}],
"description": "Data Egress from Data Source to Computer",
"name": "dataEgressUseCase2"
}
},
{
"typeName": "mysystem_dataMovement",
"attributes": {
"id": "307e6f84-41af-482e-8641-39fa258e709d",
"qualifiedName": "307e6f84-41af-482e-8641-39fa258e709d@mysystem_dataMovement",
"displayName": "Data Egress Use Case 2.5",
"inputs": [{
"uniqueAttributes": {
"qualifiedName": "b8e4ced9-f3f4-451a-8b24-3fa4d7970824@dbmesh_meshnode"
},
"typeName": "mysystem_computer"
}],
"outputs": [{
"uniqueAttributes": {
"qualifiedName": "5acddaca-6eb8-48f9-be75-fc757e442985@dbmesh_datasource"
},
"typeName": "mysystem_datasource"
}],
"name": "dataEgressUseCase2.5"
}
}

]
}

1 REPLY 1

avatar
Expert Contributor

Hello @DreamDelerium 

Lineage in Apache Atlas is typically built using Process entities that link Input and Output DataSet entities. When you're dealing with multiple processes and datasets, the correct way is to model each logical step as a separate Process entity, with the associated datasets connected as inputs and outputs.

So yes — creating multiple Process entities is the right approach for modeling complex lineage.