Member since
10-17-2016
93
Posts
10
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4885 | 09-28-2017 04:38 PM | |
7332 | 08-24-2017 06:12 PM | |
1899 | 07-03-2017 12:20 PM |
12-03-2017
02:18 PM
@Ashutosh Mestry any thoughts?
... View more
11-29-2017
12:06 PM
Atlas is a governance tool. Two of the key pillars of data governance are accountability and meeting compliance requirements. To establish accountability and traceability, tools usually support lineage information. This helps answering questions like where did the data come from, who modified it and how was it modified etc. Compliance requirements for industries like healthcare and the finance industry can be very strict. Origins of the data are required to be known with any ambiguity. Since Atlas claims to help organizations meet their compliance requirements, consider the scenario presented in the attached figure. lineage-accountability.png In the figure we notice a process reads a few data items and then writes them to two different Databases. Atlas can capture cross component lineage and will capture the inputs and the outputs of the process. How can we determine what input went to what database? There can be a situation where all records from data item 1 are written to database two and the remaining two data items are written to database 1. In such a case, I have ambiguity in the lineage. All I would know is that the data could be from any of the data sources. Will such information be enough to meet compliance requirements? The second question I have is regarding performance. Currently Kafka does not support Atlas V2. Therefore when developing the Spark Atlas addon, I used the RESP API to post the entities. Since I am also handling Spark Streaming, in such a case the number of entity notifications can be high. Can I run into scalability issues in such a scenario? Approximately what rate can the REST API handle before messages are dropped? Thanks in advance
... View more
Labels:
- Labels:
-
Apache Atlas
11-09-2017
07:04 PM
Hi @Ashutosh Mestry my first question is why doesnt atlas show the entire lineage in one go. Have a look at the attached pictures. They represent a single chain. Notice that the first linage ends at rdd 3 and then I have to open rdd 3 to see what happened further. Can it not display the entire chain at once? what determines how much of the chain will be shown from a given entity ? screenshot-from-2017-11-09-19-58-36.png screenshot-from-2017-11-09-19-59-42.png screenshot-from-2017-11-09-20-00-08.png
... View more
11-04-2017
06:44 PM
What determines how much lineage will be displayed? I have huge lineage diagrams but it seems atlas randomly chooses to show parts of the tree at different points. Should it not show the entire linage tree if i am at the root data set ? Also Atlas seems to get stuck when I have a linage diagram that consists of 200+ entities. I see the loading wheel forever. thanks
... View more
Labels:
- Labels:
-
Apache Atlas
10-23-2017
07:13 PM
Hi, I have this scenario where after reading json files I'm doing InvokeHttp against a url attribute in each json file. This returns a further list of json objects with url attribute which I later split and do InvokeHttp individually against each url for result. Now the problem is at the end I need to have a composed json against each flow resulting from the inital json that I read from file along with the later json objects and the final result received after hitting individual url. I need to save this composed json as record in mongodb. I 'm having troubling making this json, so need help regarding the flow and processors. Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
10-12-2017
11:48 PM
1 Kudo
Hi I have the following type defined in atlas. Notice that it extends both Dataset and Process. You can use this URL to post this entity using Postman: http://localhost:21000/api/atlas/v2/types/typedefs {
"stuctDefs":[],
"classificationDefs": [],
"entityDefs" : [ {
"name": "spark_testStage",
"superTypes" : ["Process",
"DataSet"],
"attributeDefs" : [
{
"name" : "test",
"typeName": "string",
"isOptional" : true,
"cardinality": "SINGLE",
"isIndexable": false,
"isUnique": false
},
{
"name": "description",
"typeName": "string",
"cardinality": "SINGLE",
"isIndexable": true,
"isOptional": true,
"isUnique": false
} ]
}]
}
Now i create the following entities using this URL http://localhost:21000/api/atlas/v2/entity {
"referredEntities":
{
"-208942807557405": {
"typeName": "spark_testStage",
"attributes": {
"owner": "spark",
"qualifiedName": "Stage6@clusterName",
"name": "Stage6",
"description": "this is attribute is inclued due to inheritance"
},
"guid": "-208942807557405",
"version": 0,
"inputs":
[
{
"typeName": "spark_testStage",
"attributes":
{
"source": "testing",
"description": null,
"qualifiedName": "Stage5@clusterName",
"name": "Stage5",
"owner": null,
"destination": "hdfs://vimal-fenton-4-1.openstacklocal:80ion"
},
"guid": "-208942807557404"
}
],
"classifications": []
}
},
"entity":
{
"guid": "-208942807557404",
"status": "ACTIVE",
"version": 0,
"typeName": "spark_testStage",
"attributes" :
{
"qualifiedName" : "Stage5@clusterName",
"name" : "Stage5",
"test" : "this is source",
"description" : "source",
"outputs":
[
{
"typeName": "spark_testStage",
"attributes":
{
"source": "testing",
"description": null,
"qualifiedName": "Stage6@clusterName",
"name": "Stage6",
"owner": null,
"destination": "hdfs://vimal-fenton-4-1.openstacklocal:8020/apps/hive/warehouse/destination"
},
"guid": "-208942807557405"
}
]
},
"classifications": []
}
}
Why doesnt atlas define a line between the two entities. The request is successful but i dont see any lineage. Also I see that stage5 has stage 6 in its output and is correctly linked. I can click on this link on to stage 6 but stage 6 does not have stage 5 as its input.
... View more
Labels:
- Labels:
-
Apache Atlas
10-05-2017
05:01 PM
Hi @Ashutosh Mestry I was using the json you provided in the Zip file. I tried the other file which works fine. The zip file is probably the response once the entities are created. Thanks
... View more
10-04-2017
10:06 AM
thank you @Ashutosh Mestry for such a detailed response. Helps alot! Unfortunately the hive entities are not created. I copied your hive table entities JSON in POSTMAN and i get the following error: {
"errorCode": "ATLAS-404-00-00A",
"errorMessage": "Referenced entity 027a987e-867a-4c98-ac1e-c5ded41130d3 is not found"
}
<br>
... View more