Member since
03-24-2016
184
Posts
239
Kudos Received
39
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2091 | 10-21-2017 08:24 PM | |
1333 | 09-24-2017 04:06 AM | |
4919 | 05-15-2017 08:44 PM | |
1379 | 01-25-2017 09:20 PM | |
4618 | 01-22-2017 11:51 PM |
07-07-2016
08:38 PM
1 Kudo
@milind pandit This is exactly the kind of thing that Tags are for. When an entity, for example a hive table, is tagged, it will show up when you view all entities associated with that tag. With the new Atlas/Ranger integration, you can create security policies that only apply to hive tables or even hive table columns that have been tagged with that tag. This allows you control access to the hive table or hive table columns just by adding or removing the tag or adding/removing use groups to whom the tag based policy applies. For example, using Atlas and Ranger, you can easily keep track of Data Sets classified as PII and control and audit access to these data sets.
... View more
07-07-2016
08:18 PM
2 Kudos
@milind pandit Atlas types can be configured with new attributes that are strongly typed collections. These collections can hold references to other Atlas types. Let's say you wanted to associate a DDL document with a Hive table. You could use the REST API to create a new type called DDL and update the hive_table type to include a new attribute called DDL. The DDL attribute would be typed as DDL type. You would then create a new entity of the DDL type and define it's DDL-Text attribute to contain the DDL statement. You would then update the target hive_table entity's new DDL field with the newly created instance of the DDL entity that contains the DDL text that created the table. When you search for the target hive table entity in Atlas, you will see it has a new field and that field has a link to the DDL entity. You could use this approach to handle any meta data you wish to add to any existing type or custom type.
... View more
07-04-2016
03:59 AM
@Manoj Dhake Download and use Sandbox 2.5, it was just released and is running Atlas 0.7. https://hortonworks.com/downloads/#tech-preview This version of Atlas is much better than 0.5 and 0.6 and should come pre-installed. Also, remember that in order to see lineage for an entity, you need to run a process that creates a new entity or modifies the existing entity. The simplest example is running a create table ... select ... from... SQL statement.
... View more
07-04-2016
03:51 AM
@Russell Anderson Could you ask your question as a new question and in more detail? I want to make sure that the question and answer is easily searchable.
... View more
07-04-2016
03:41 AM
1 Kudo
@Manoj Dhake Assuming you are on Sandbox 2.5 accessing Atlas 0.7, you will need to use the -u with the credentials for Atlas. On versions earlier that 0.7, you do not need to provide credentials. 1. To get all entities of a particular type: curl -u admin:admin -X GET http://sandbox.hortonworks.com:21000/api/atlas/types Atlas should return all of the types {"results":["Asset","Process","hive_column","storm_node","storm_bolt","falcon_process","hive_serde","falcon_feed_replication","hbase_table","kafka_topic","hive_table","hive_storagedesc","sqoop_dbdatastore","fs_permissions","hive_principal_type","jms_topic","hive_process","falcon_cluster","storm_spout","Referenceable","falcon_feed_creation","falcon_feed","hdfs_path","sqoop_process","Infrastructure","storm_topology","hive_order","DataSet","Taxonomy","fs_path","hive_db","file_action"],"count":32,"requestId":"qtp86996017-77 - d3edd2c6-229c-4344-8745-1a4337856a1f"} Now, request all entities of hive_table type curl -u admin:admin -X GET http://sandbox.hortonworks.com:21000/api/atlas/entities?type=hive_table Atlas will respond with an array of GUIDs of entities that are typed as hive_table {"requestId":"qtp86996017-144 - 8974d5b6-18fe-41c4-bb86-54f07861560f","typeName":"hive_table","results":["63683ca8-e5a9-4c4c-b02e-3fe01bfda2a2","563e7954-7d4c-45a6-9237-3e94e4d23f68","7751030c-d902-41fd-992d-f209a8e5278e","a9d45d64-0aea-4d18-abcd-919f6a3ae1e7","405079e5-dd29-41dc-98e8-49dc6f70f36d"],"count":5} To view the entire instance of any of the above entities curl -u admin:admin -X GET http://sandbox.hortonworks.com:21000/api/atlas/entities/63683ca8-e5a9-4c4c-b02e-3fe01bfda2a2 Atlas will respond with the entire entity definition. You can apply the previous steps to find and view any entity type and instance. 2. Issue the following request once you have identified the entity from which you want to determine applied tags curl -u admin:admin -X GET http://sandbox.hortonworks.com:21000/api/atlas/entities/63683ca8-e5a9-4c4c-b02e-3fe01bfda2a2/traits Atlas will respond with: {"requestId":"qtp86996017-76 - 6a8faae6-627f-4f77-a6b2-21d74f1cacca","results":[],"count":0} If the count is greater than 0 and the results field has a reference to a tag (trait) instance, then the entity has been tagged. 3. To get all defined tags (traits), make the following request: curl -u admin:admin -X GET http://sandbox.hortonworks.com:21000/api/atlas/types?type=TRAIT Atlas will respond with: {"results":["publish"],"count":1,"requestId":"qtp86996017-248 - 16e3295b-9ed3-4c3f-9910-826218e93d89"} The payload should contain the names of each tag that has been created. In this example, you can view the details of the publish tag (trait) as follows: curl -u admin:admin -X GET http://sandbox.hortonworks.com:21000/api/atlas/types/publish Atlas will respond with: {"typeName":"publish","definition":{"enumTypes":[],"structTypes":[],"traitTypes":[{"superTypes":[],"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType","typeName":"publish","typeDescription":"","attributeDefinitions":[]}],"classTypes":[]},"requestId":"qtp86996017-11 - 68f0d5af-05da-4254-b299-06e3d1dfa881"}
... View more
06-27-2016
12:11 PM
@Ethan Hsieh The entities don't necessarily have any lineage when they are first created. Hive tables that were not the source of some other table or registered data structure will show no lineage. Try creating a new table from the existing table that is already registered using Create table --- select --- from --- statement. That should create another table from the existing hive table and the hive hook should register that lineage. You should then be able to see lineage form the parent and child table.
... View more
06-24-2016
08:36 PM
1 Kudo
@Vance Wei Try using Gremlin Server, this should give you access the the Titan layer and the ability to affect the underlying BerkeleyDB store. http://s3.thinkaurelius.com/docs/titan/0.9.0-M1/server.html
... View more
06-24-2016
08:31 PM
@Ethan Hsieh The following labs demonstrate some of the built in hooks that Atlas has with components of HDP. http://hortonworks.com/hadoop-tutorial/tag-based-policies-atlas-ranger/ http://hortonworks.com/hadoop-tutorial/cross-component-lineage-apache-atlas/ Check out this repo for an example of a Nifi Reporting task using the Atlas Java API to capture Nifi provenance in Atlas. https://community.hortonworks.com/content/repo/39432/nifi-atlas-lineage-reporter.html You can use the same approach to integrate any tool that give you the ability to integrate Java code or call out to a REST API to integrate with Atlas. http://atlas.incubator.apache.org/0.6.0-incubating/api/rest.html Check out this thread for a detailed example of how to use a JSON document description of a tag (trait), to register the tag via the Atlas REST API, Add the tag via the UI (this can be done via the REST API as well), and then use the REST API to see the entity with the tag attached as JSON. https://community.hortonworks.com/questions/33501/how-to-create-attribute-sets-and-collections-using.html#answer-40511 Also this one for an example of how to see entities and hive lineage information via the REST API. https://community.hortonworks.com/questions/38380/how-can-we-get-hive-lineage-data-using-rest-api-in.html
... View more
06-24-2016
08:19 PM
@Ethan Hsieh 1. Audit in this case means the ability to see how the data set was first created and how it has been altered since landing on the cluster. Since Atlas 0.6, this the ability to track data coming through and from the following components.
Hive Bridge Sqoop Bridge Falcon Bridge Storm Bridge Other components can use the Atlas REST and Java API to register lineage. Check out this repo for an example of an Apache Nifi Reporting Task that registers provenance with Atlas. https://community.hortonworks.com/content/repo/39432/nifi-atlas-lineage-reporter.html Once components of the modern data application are integrated with Atlas, concepts like data governance and tag based security policies when combined with Apache Ranger, become possible. 2. Check out this lab to get an understanding of how to enable and navigate cross component data set level lineage. http://hortonworks.com/hadoop-tutorial/cross-component-lineage-apache-atlas/ http://hortonworks.com/hadoop-tutorial/tag-based-policies-atlas-ranger/ 3. You need to search for and find entities before you can assign tags to them. Try the labs in the above links and along the way you should see the links to add tags, also referred to as traits, to the resulting entities.
... View more
06-17-2016
06:48 PM
2 Kudos
@Laura Ngo I ran your type definition through the REST API for Atlas 0.6. It is valid and does show up as a valid tag in the Atlas UI. From the AtlasUI, I added the tag to a hive table entity successfully and then gave the tag some attributes. It does not look like the UI is able to show or edit the array type attributes for the tags but I was able to call the REST API (the GUID on the end of the URI is the entity id): curl -X GET http://localhost:21000/api/atlas/entities/a6f3e6c8-57f6-45ce-98e7-ea14a1f29211 and get the following result: {
"requestId": "qtp1635546341-110 - 8bb20991-84e5-4dc5-a678-3dbe81cb52a2",
"GUID": "a6f3e6c8-57f6-45ce-98e7-ea14a1f29211",
"definition": {
"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id": {
"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id": "a6f3e6c8-57f6-45ce-98e7-ea14a1f29211",
"version": 0,
"typeName": "hive_column"
},
"typeName": "hive_column",
"values": {
"comment": null,
"qualifiedName": "hr.employee.location@erietp",
"type": "string",
"name": "location"
},
"traitNames": [
"PII",
"api_test_set"
],
"traits": {
"PII": {
"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
"typeName": "PII",
"values": {
}
},
"api_test_set": {
"jsonClass": "org.apache.atlas.typesystem.json.InstanceSerialization$_Struct",
"typeName": "api_test_set",
"values": {
"collection_test": [
"test"
],
"set_test": [
"test"
]
}
}
}
}
} Notice that the collection_test and set_test values show up as arrays denoted by [ ] and they are populated. As stated before, this will not show up in the UI but the value are persisted. I have not tried to add more elements to the arrays within the trait through the REST API but I don't see any reason why that would not work. If you were planning to use the value within the tags for some custom purpose you should be good to go.
... View more