Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Apache Atlas - How to show Process in lineage graph when it has only inputs

Highlighted

Apache Atlas - How to show Process in lineage graph when it has only inputs

New Contributor

Hi all,

I'm trying to map dependencies between a set of Datasets and a set of Processes in Apache Atlas.

The scenario can be described as follow.


Process p1:

input datasets: d1, d2, d3

output datasets: d4, d5


Process p2:

input datasets: d1, d4

output datasets: d6


Process p3:

input datasets: d6

output datasets: none


When looking at dataset d6 lineage I can clearly see that it is being generated by process p2, but I cannot see that it is also an input of process p3.

This seems to be related to the fact that process p3 does not have any output dataset.

But I would be able to visualize it in d6 lineage, because a relationship between d6 and p3 exists (d6 is an input of p3).


As per my understanding, a process is shown in lineage ONLY if it has at least an input AND an output.

IMHO, this condition is too strict. A process should be shown in a lineage graph whenever it has at least an input OR an output.


Does anyone know if this is a configurable option in Apache Atlas 1.1.0 or if there's a workaround?

Thank you very much,

Alessandro