Member since
03-29-2016
36
Posts
11
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
724 | 04-08-2018 09:30 PM | |
388 | 03-15-2018 08:13 PM | |
518 | 03-14-2018 04:10 PM | |
825 | 03-14-2018 03:48 PM | |
603 | 02-19-2018 01:03 AM |
04-08-2018
09:30 PM
It's not elegant, but this custom masking seems to work. cast(mask(cast(to_date({col}) as date), 'x', 'x', 'x', -1, '1', 1, 0, -1) as timestamp)
... View more
04-08-2018
12:54 AM
Is there a way of masking timestamps using the masking policies in Ranger? Perhaps through a custom UDF referenced in the policy? The masking type of "Date: show only year" only works for dates, not timestamps. Looking at GitHub (as the Ranger documentation isn't complete) timestamps cannot be masked, only nullified - https://github.com/myui/hive-udf-backports/tree/master/src/main/java/org/apache/hadoop/hive/ql/udf/generic mask(value, upperChar, lowerChar, digitChar, otherChar, numberChar, dayValue, monthValue, yearValue)
Supported types: TINYINT, SMALLINT, INT, BIGINT, STRING, VARCHAR, CHAR, DATE
Reason: some systems store dates as timestamps, so date of birth (PII) could be stored as a timestamp and I need to mask it.
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Ranger
03-15-2018
08:16 PM
If you want to change something you need to use PUT. Use POST to add new tags.
... View more
03-15-2018
08:13 PM
2 Kudos
The taxonomy is still in tech preview - have you switched it on using the below link to the documentation. Do you see a 'Taxonomy' tab in the UI? https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_data-governance/content/atlas_enabling_taxonomy_technical_preview.html Just a word of caution - in a large production environment switching the taxonomy on can cause Atlas performance issues. We switched it off until it comes out of tech preview. Here's some info on how to query terms via curl. https://atlas.apache.org/0.7.1-incubating/AtlasTechnicalUserGuide.pdf I can't find any info about the terms in the latest rest api documentation.
POST
http://<atlasserverhost:port>/api/atlas/v1/taxonomies/Catalog/terms
/{term_name} I'm sure someone who knows more than I do will come along soon!
... View more
03-14-2018
04:10 PM
1 Kudo
I think you are using the wrong API. I believe you need to use PUT /v2/types/typedefs - see https://atlas.apache.org/api/v2/ui/index.html#!/TypesREST/resource_TypesREST_updateAtlasTypeDefs_PUT To see the current definition of have_table you can do GET http://localhost:21000/api/atlas/v2/types/typedef/name/hive_table
... View more
03-14-2018
03:48 PM
Are you trying to add a description to a tag? If so you are using the wrong API. The one you are using is to add tags to entities, like hive tables. To add a description to a tag you need to use /v2/types/typedefs (POST to add a new tag and PUT to edit an existing one )- see https://atlas.apache.org/api/v2/ui/index.html#!/TypesREST/resource_TypesREST_createAtlasTypeDefs_POST {
"classificationDefs":[
{
"createdBy": "admin",
"name": "test_tag_name",
"description": "Description of your tag",
"attributeDefs": [
{
"name":"attribute_name_1",
"typeName":"string",
"isOptional":"true",
"isUnique":"false",
"isIndexable":"true",
"cardinality":"SINGLE"
},
{
"name":"attribute_name_2",
"typeName":"string",
"isOptional":"true",
"isUnique":"false",
"isIndexable":"true",
"cardinality":"SINGLE"
},
{
"name":"update_date",
"typeName":"date",
"isOptional":"true",
"isUnique":"false",
"isIndexable":"true",
"cardinality":"SINGLE"
}],
"superTypes": []
}
]
}
... View more
03-10-2018
10:15 PM
I can only return a maximum of 100 results when doing a DSL API search in Atlas. Is this by design or a bug? Even with a limit of 1000 only 100 items are returned curl -k -u admin:admin -H "Content-type:application/json" -X GET https://url:port/api/atlas/v2/search/dsl?query=hive_column%20where%20__state%3D%27ACTIVE%27%20and%20qualifiedName%20like%20%27prod_%2A_data_lake%2A%27%20select%20qualifiedName%2Cname%2C__guid%20limit%201000 | python -m json.tool > hive_column_prod_data_lake_limit.json
No limit yet still 100 items are returned. There are a lot more than 100 items - when I do the same for hive_tables only 100 items are returned. curl -k -u admim:admin -H "Content-type:application/json" -X GET https://url:port/api/atlas/v2/earch/dsl?query=hive_column%20where%20__state%3D%27ACTIVE%27%20and%20qualifiedName%20like%20%27prod_%2A_data_lake%2A%27%20selct%20qualifiedName%2Cname%2C__guid | python -m json.tool > hive_column_prod_data_lake.json
This is on a 2.6.3 HDP install.
... View more
Labels:
- Labels:
-
Apache Atlas
03-06-2018
10:08 PM
1 Kudo
I have tested it my side on 2.6.4 sandbox and all 3 were attached for me. (You don't need -X POST together with -d.) curl -d @atlas_hdfs_path_classification2.json -u admin:admin -H 'Content-Type: application/json; charset=UTF-8' http://127.0.0.1:21000/api/atlas/v2/entity/bulk/classification {
"classification": {
"typeName": "environment",
"attributes": {}
},
"entityGuids": [
"bb69126d-1e7f-4fef-8e7c-21935ee86bf8",
"c59e438c-5800-495b-bcc2-c9fe543044a6",
"9dfb47d3-01fb-4b25-9833-75c9c36d2a40"
]
}
On a similar note, I wanted to create multiple classifications in one go. I found this worked, where fileToUpload.dat contained a list of json files. <fileToUpload.dat xargs -I % curl -X POST -T "{%}" -u admin:admin -H "Content-Type: application/json" http://127.0.0.1:21000/api/atlas/v2/types/typedefs But the below only created the first classification. No idea why! cat pii.json pii_attribute.json | curl -d@- -u admin:admin -H "Content-Type: application/json" http://127.0.0.1:21000/api/atlas/v2/types/typedefs
... View more
03-06-2018
09:15 PM
I'm going to use tag-based policies in the following way:
Tags at the column level will be used to mask data. Tags at the table/database level will be used to drive a multi-tenancy ABAC policy This is because if access is given or restricted at a column level, you also end up having an effect on the table/database too, giving undesired consequences - for me anyway. In a multi-tenancy environment:
one object (table/database) can be in one or many tenancies a tenancy may comprise of development, test and production environments a tenancy will have different landing, staging and data lake areas i.e. data zones I can give tag-based access to all production data lake objects in tenancy_xxx to raj_ops. Later, I may want to give test staging and data lake access to holger_gov. Or I may create a new tenancy and want to give the same type of access to raj_ops and holger_gov. Doing this with tags is much simpler to control than RBAC. If I have tagged all my objects with 3 tags then creating attributed based policies is trivial. tag = environment (attribute name = name, type = string) tag = data_zone (attribute name = name, type string) tag = tenancy_xxx
... View more
03-05-2018
08:44 PM
You are exactly right - thank you. Both the Atlas tag and the Ranger policy were in caps but I don't think Ranger Audit likes caps. I changed both to lower and the access is denied. access-denied.png Thanks so much for your help. (I've never been so happy to see an 'access denied' message!)
... View more
03-05-2018
05:31 PM
@Madhan Neethiraj - I've changed the Ranger policy via PUT service/public/v2/api/policy/35 "conditions": [
{
"type": "expression",
"values": ["ctx.getAttributeValue('DATA_ZONE','name').equals('data_lake')"]
}
]
<br> But I'm still not getting the desired behaviour - I would expect holger_gov to be denied access based on the below flow: Here's a screenshot of the Ranger Audit Could you perhaps paste the full output of your GET service/public/v2/api/policy/{i} policy so I can compare?
Here's mine:
{"id": 35,
"guid": "1d7a6456-840d-4d1d-b5d5-7ec37d50eb8c",
"isEnabled": true,
"createdBy": "Admin",
"updatedBy": "Admin",
"createTime": 1520122079000,
"updateTime": 1520255099000,
"version": 22,
"service": "sandbox_tag",
"name": "tenancy_food",
"policyType": 0,
"description": "",
"resourceSignature": "5b2d59d4b57c1fa990c17143d54c89974270cf8e928f982e03c89055cbc69386",
"isAuditEnabled": true,
"resources": {"tag": {"values": [ "tenancy_food"
],
"isExcludes": false,
"isRecursive": false
}
},
"policyItems": [ {"accesses": [ {"type": "hive:select",
"isAllowed": true
},
{"type": "hive:update",
"isAllowed": true
},
{"type": "hive:create",
"isAllowed": true
},
{"type": "hive:drop",
"isAllowed": true
},
{"type": "hive:alter",
"isAllowed": true
},
{"type": "hive:index",
"isAllowed": true
},
{"type": "hive:lock",
"isAllowed": true
},
{"type": "hive:all",
"isAllowed": true
},
{"type": "hive:read",
"isAllowed": true
},
{"type": "hive:write",
"isAllowed": true
},
{"type": "hive:repladmin",
"isAllowed": true
},
{"type": "hive:serviceadmin",
"isAllowed": true
}
],
"users": [ "holger_gov"
],
"groups": [],
"conditions": [ {"type": "expression",
"values": [ "ctx.getAttributeValue('DATA_ZONE','name').equals('data_lake')"
],
}
],
"delegateAdmin": false
}
],
"denyPolicyItems": [],
"allowExceptions": [],
"denyExceptions": [],
"dataMaskPolicyItems": [],
"rowFilterPolicyItems": [],
}
<br>
... View more
03-05-2018
01:01 AM
@Madhan Neethiraj - The Ranger Audit looks to me as though only policy 35 is being used. I've attached some screen prints. I'm not sure if it's relevant, but the Ranger Audit policy when clicked doesn't show the ',', although in the actual policy the ',' is still present.
... View more
03-04-2018
08:49 PM
@Madhan Neethiraj I'm having problems getting the tag policy to work. Here are the steps I've taken: Disabled resource policies - holger_gov cannot select on default and foodmart database Created a tag called tennant_xxx Created a tag policy giving access to holger_gov if tag is tennant_xxx - so far so good as holger_gov now has select access Created a tag called DATA_ZONE with attribute name (type string) and added to default and footmart database - one with a name = data_lake and one with name = staging Added policy condition: ctx.getAttributeValue("DATA_ZONE", "name").equals("data_lake") But holger_gov can still select on both databases - I only want the data_lake one to be selectable. I have tried various combinations to try to get it to work including the below, but to no avail. Any ideas? if(ctx.getAttributeValue("DATA_ZONE", "name").equals("data_lake")) {
ctx.result = true;
} else {
ctx.result = false;
}
... View more
02-28-2018
12:50 PM
Thanks, @Madhan Neethiraj - that's very helpful indeed. And I like your suggestion about how to structure the Ranger policies - very logical. I will try this out and will post back if I have any other queries about this.
... View more
02-23-2018
05:49 PM
1 Kudo
Is it possible to reference more than one Atlas tag in one Ranger policy via the Policy Conditions? I can set-up allow or deny tag policies, but would like to reference a combination of tags in the Policy Conditions on one policy. Is this possible? Example Let's say I have these 3 tags: (tenancy_component with some attributes, tenancy_xxx and tenancy_yyy). {
"classificationDefs":[
{
"createdBy": "Laura",
"name": "tenancy_component",
"description": "tenancy_component",
"attributeDefs": [
{
"name":"landing",
"typeName":"boolean",
"isOptional":"true",
"isUnique":"false",
"isIndexable":"true",
"cardinality":"SINGLE"
},
{
"name":"staging",
"typeName":"boolean",
"isOptional":"true",
"isUnique":"false",
"isIndexable":"true",
"cardinality":"SINGLE"
},
{
"name":"data_lake",
"typeName":"boolean",
"isOptional":"true",
"isUnique":"false",
"isIndexable":"true",
"cardinality":"SINGLE"
}],
"superTypes": []
}
]
}
{
"classificationDefs":[
{
"createdBy": "Laura",
"name": "tenancy_xxx",
"description": "tenancy_xxx",
"attributeDefs": [
{
}],
"superTypes": []
},
{
"createdBy": "Laura",
"name": "tenancy_yyy",
"description": "tenancy_yyy",
"attributeDefs": [
{
}],
"superTypes": []
}
]
}
I want to provide access (ABAC) to a role such that it doesn't have access to landing unless it is in tenancy xxx, it has access to the data lake for tenancy xxx but not yyy. The role only have access to staging if it is part of tenancy_yyy. Database name Tags Access db1 tenancy_xxx, tenancy_component.landing=true Access db2 tenancy_xxx, tenancy_component.staging=true Deny db3 tenancy_xxx, tenancy_component.data_lake=true Access db5 tenancy_yyy, tenancy_component.landing=true Deny db6 tenancy_yyy, tenancy_component.staging=true Access db7 tenancy_yyy, tenancy_component.data_lake=true Deny db7 tenancy_component.data_lake=true Deny How many tag policies should I have and how would I do it?
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Ranger
02-19-2018
01:03 AM
I actually figured it out myself. I needed to use the following JavaScript for the policy conditions: tagAttr.masking_type=='hash' tagAttr.masking_type=='nullify' tagAttr.masking_type=='year' tagAttr.last_4
... View more
02-15-2018
12:25 AM
I want to mask some data. I'm testing in the 2.6.3 sandbox I have created a tag: {"category": "CLASSIFICATION",
"guid": "bb29dc29-11ba-4d92-8d8f-fdca8ae92ea4",
"createdBy": "holger_gov",
"updatedBy": "holger_gov",
"createTime": 1518326442355,
"updateTime": 1518326442355,
"version": 1,
"name": "test_pii_tag",
"description": "test_pii_tag",
"typeVersion": "1.0",
"attributeDefs": [ {"name": "masking_type",
"typeName": "string",
"isOptional": true,
"cardinality": "SINGLE",
"valuesMinCount": 0,
"valuesMaxCount": 1,
"isUnique": false,
"isIndexable": false
},
{"name": "last_4",
"typeName": "boolean",
"isOptional": true,
"cardinality": "SINGLE",
"valuesMinCount": 0,
"valuesMaxCount": 1,
"isUnique": false,
"isIndexable": false
}
],
"superTypes": [],
}
I have tagged 4 columns on foodmart.customer with test_pii_tag and set the following attributes: lname (attribute string masking_type = "hash") fname (attribute string masking_type = "nullify") address1 (attribute boolean last_4 = true ) birthdate (attribute string masking_type = "year") I created one Ranger tag policy and set the following deny setting for raj_ops: Mask: Hive hash if ( tagAttr.get('masking_type').equals("hash") ) {
ctx.result = true;
}
Mask: Hive nullify if ( tagAttr.get('masking_type').equals("nullify") ) {
ctx.result = true;
}
Mask: Hive Date: show only year if ( tagAttr.get('masking_type').equals("year") ) {
ctx.result = true;
}
Mask: Hive Partial mask show last 4 if ( tagAttr.get('last_4').equals("true") ) {
ctx.result = true;
}
-- I also tried the below with the same results
if ( tagAttr.get('last_4') ) {
ctx.result = true;
} When I run SELECT * FROM customer LIMIT 100; I see the following: lname is hashed - as expected fname null - as expected address1 is hashed - not as expected birthdate yyyy-01-01 as expected What is wrong with my javascript expressions to cause address1 to be hashed instead of 'Partial mask show last 4'?
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Ranger
01-24-2018
03:48 PM
There's documentation for creating a child/parent relationship for classifications in the UI. What would be the equivalent script for creating the same relationship in the API? Let's say I have a parent classification named 'security_protection'. I want to create child classifications via API request named 'disk_encryption' and field_encryption'. What would be the v2 API request to do this? Thanks in advance. Version 2.6.3
... View more
Labels:
- Labels:
-
Apache Atlas
09-11-2017
10:28 AM
@Eyad Garelnabi - Will the optionality on tag propagation be system wide, or would it be possible to exclude some tags from being propagated whilst propagating others?
... View more
09-11-2017
09:25 AM
Please could someone help me describe/define for me what exactly is meant by 'depth' in lineage. I understand the default is 3. Is this 3 <hive>_processes in each direction? curl -k -u admin:admin -H "Content-type:application/json" -X GET https://atlas.url:atlas_port/api/atlas/v2/lineage/a2303e0a-e3ff-4823-8c45-c59a86438a77?depth=7 | python -m json.tool Also, how would I write a curl query that would define both the depth and the direction? Currently I can only seem to be able to do one other other. Below is the direction defined. curl -k -u admin:admin -H "Content-type:application/json" -X GET https://atlas.url:atlas_port/api/atlas/v2/lineage/a2303e0a-e3ff-4823-8c45-c59a86438a77?direction=INPUT | python -m json.tool On the UI side, is the lineage graph defaulted to a depth of 3? Is there any way I can change the output to a custom depth? Is there a way of filtering the lineage graph, for example by tag? I have many, many temp staging tables that are showing in the lineage graph in our staging area, and really making the lineage graph a mess. I'm not joking when I a say there are some tables with a 'line' an inch think because of the number of temp staging tables!!!
... View more
Labels:
- Labels:
-
Apache Atlas
06-28-2017
08:17 AM
I'm posting this in case anyone finds it useful. There's now a way for metamodels to inherit values from other values, but in 0.8 you can use the qualifiedName instead of the guid, which is much better. Type POST http://127.0.0.1:21000/api/atlas/v2/types/typedefs {
"enumDefs":[],
"structDefs":[],
"classificationDefs":[],
"entityDefs":[
{
"superTypes":[
"DataSet"
],
"name":"test_entity_18",
"description":"test_entity_18",
"attributeDefs":[
{
"name":"test_18",
"isOptional":true,
"isUnique":true,
"isIndexable":false,
"typeName":"string",
"valuesMaxCount":1,
"cardinality":"SINGLE",
"valuesMinCount":0
},
{
"name":"test_18_db",
"isOptional":true,
"isUnique":true,
"isIndexable":false,
"typeName":"hive_db",
"valuesMaxCount":1,
"cardinality":"SINGLE",
"valuesMinCount":0
}
]
}
]
} Entity POST http://127.0.0.1:21000/api/atlas/v2/entity {
"entity": {
"typeName": "test_entity_18",
"createdBy": "admin",
"updatedBy": "admin",
"attributes": {
"description": "test decription",
"name": "test_entity_18",
"owner": "admin",
"qualifiedName": "test_entity_18",
"test_18": "attr1",
"test_18_db": {
"typeName": "hive_db",
"uniqueAttributes": {
"qualifiedName": "default@Sandbox"
}
}
},
"guid": -1
},
"referredEntities": {}
} I learnt this from here - https://issues.apache.org/jira/browse/ATLAS-1506
... View more
06-25-2017
07:58 PM
Thank you, that helps enormously! I'm obviously doing something wrong with the GET /v2/entity/uniqueAttribute/type/{typeName} I created an entityDefs type called test_entity_15. I then created an entity called test_entity_15. {
"enumDefs":[],
"structDefs":[],
"classificationDefs":[],
"entityDefs":[
{
"superTypes":[
"DataSet"
],
"name":"test_entity_15",
"description":"test_entity_15",
"attributeDefs":[
{
"name":"test_15_1",
"isOptional":true,
"isUnique":true,
"isIndexable":false,
"typeName":"string",
"valuesMaxCount":1,
"cardinality":"SINGLE",
"valuesMinCount":0
}
]
}
]
} {
"entity": {
"typeName": "test_entity_15",
"attributes": {
"description": "test_entity_15",
"name": "test_entity_15_1",
"owner": "admin",
"qualifiedName": "test_entity_15@Sandbox",
"test_15_1": "attr1"
},
"guid": -1
},
"referredEntities": {}
} But when I try GET http://127.0.0.1:21000/api/atlas/v2/entity/uniqueAttribute/type/test_entity_15 I get this error. {
"errorCode": "ATLAS-400-00-013",
"errorMessage": "Type test_entity_15 with unique attribute does not exist"
}
I set "isUnique" to true, so I'm not sure what else is preventing the unique attribute! Also, I created a JIRA due to the createdBy and updatedBy inconsistency between type and entity POSTs- https://issues.apache.org/jira/browse/ATLAS-1895
... View more
06-25-2017
03:31 PM
Thank you for you're help @Ashutosh Mestry and @Sarath Subramanian. And apologies for the late response. For anyone who is interested, here's what happened when I posted the entity. Response: {
"mutatedEntities": {
"CREATE": [ {
"typeName": "test_entity_7",
"attributes": {
"qualifiedName": "test_entity_7_hw@Sandbox"
}
,
"guid": "01960675-149f-43da-bdb8-da79058beb51",
"status": "ACTIVE"
}
],
}
,
"guidAssignments": {
-1: "01960675-149f-43da-bdb8-da79058beb51"
}
} GET http://127.0.0.1:21000/api/atlas/v2/entity/guid/01960675-149f-43da-bdb8-da79058beb51 {
"referredEntities": {
}
,
"entity": {
"typeName": "test_entity_7",
"attributes": {
"owner": "admin",
"test_7_2": "attr2",
"test_7_1": "attr1",
"qualifiedName": "test_entity_7_hw@Sandbox",
"name": "test_entity_7_hw",
"description": "test decription"
}
,
"guid": "01960675-149f-43da-bdb8-da79058beb51",
"status": "ACTIVE",
"createdBy": "holger_gov",
"updatedBy": "holger_gov",
"createTime": 1498267676098,
"updateTime": 1498267676098,
"version": 0,
"classifications": [],
}
}
I'm just testing in a sandbox and using Chrome apps 'Advanced REST client'. Is this why createdBy and updatedBy is set to holger_gov? If I set the createdBy and updatedBy I still get the holger_gov: POST http://127.0.0.1:21000/api/atlas/v2/entity {
"entity": {
"typeName": "test_entity_7",
"createdBy": "admin",
"updatedBy": "admin",
"attributes": {
"description": "test decription",
"name": "test_entity_7_hw_admin",
"owner": "admin",
"qualifiedName": "test_entity_7_hw_admin@Sandbox",
"test_7_1": "attr1",
"test_7_2": "attr2"
},
"guid": -1
},
"referredEntities": {}
}
Response {
"mutatedEntities": {
"CREATE": [ {
"typeName": "test_entity_7",
"attributes": {
"qualifiedName": "test_entity_7_hw_admin@Sandbox"
}
,
"guid": "ed9cf696-cd76-4814-a407-9fdb8d18da3c",
"status": "ACTIVE"
}
],
}
,
"guidAssignments": {
-1: "ed9cf696-cd76-4814-a407-9fdb8d18da3c"
}
}
GET http://127.0.0.1:21000/api/atlas/v2/entity/guid/ed9cf696-cd76-4814-a407-9fdb8d18da3c {
"referredEntities": {
}
,
"entity": {
"typeName": "test_entity_7",
"attributes": {
"owner": "admin",
"test_7_2": "attr2",
"test_7_1": "attr1",
"qualifiedName": "test_entity_7_hw_admin@Sandbox",
"name": "test_entity_7_hw_admin",
"description": "test decription"
}
,
"guid": "ed9cf696-cd76-4814-a407-9fdb8d18da3c",
"status": "ACTIVE",
"createdBy": "holger_gov",
"updatedBy": "holger_gov",
"createTime": 1498268595794,
"updateTime": 1498268595794,
"version": 0,
"classifications": [],
}
}
Also, how do I use /v2/entity/bulk? I get the following error: {
"errorCode": "ATLAS-404-00-005",
"errorMessage": "Given instance guid {0} is invalid/not found"
}
I currently can't find a way of returning the GET for an attribute without first copying the guid when I first create it!!!
... View more
06-20-2017
04:19 PM
Please could someone provide me with an example AtlasEntity REST API POST query I can use on my sandbox please. I'm struggling to post a successful one with v0.8. I managed to create a type OK. Here's my entity type. POST http://127.0.0.1:21000/api/atlas/v2/types/typedefs {
"enumDefs":[],
"structDefs":[],
"classificationDefs":[],
"entityDefs":[
{
"superTypes":[
"DataSet"
],
"name":"test_entity_7",
"description":"test_entity_7",
"createdBy": "admin",
"updatedBy": "admin",
"attributeDefs":[
{
"name":"test_7_1",
"isOptional":true,
"isUnique":false,
"isIndexable":false,
"typeName":"string",
"valuesMaxCount":1,
"cardinality":"SINGLE",
"valuesMinCount":0
},
{
"name":"test_7_2",
"isOptional":true,
"isUnique":false,
"isIndexable":false,
"typeName":"string",
"valuesMaxCount":1,
"cardinality":"SINGLE",
"valuesMinCount":0
}
]
}
]
}
I know the entity is supposed to be in this form, but could someone help me relate this back to my example type please? (I'm not a coder so I have to learn though examples.) POST http://127.0.0.1:21000/api/atlas/v2/entity {
"entity" : {
"guid" : "...",
"status" : "ACTIVE",
"createdBy" : "...",
"updatedBy" : "...",
"createTime" : 12345,
"updateTime" : 12345,
"version" : 12345,
"classifications" : [ {
"typeName" : "...",
"attributes" : {
"property1" : { },
"property2" : { }
}
}, {
"typeName" : "...",
"attributes" : {
"property1" : { },
"property2" : { }
}
} ],
"typeName" : "...",
"attributes" : {
"property1" : { },
"property2" : { }
}
},
"referredEntities" : {
"property1" : {
"guid" : "...",
"status" : "ACTIVE",
"createdBy" : "...",
"updatedBy" : "...",
"createTime" : 12345,
"updateTime" : 12345,
"version" : 12345,
"classifications" : [ {
"typeName" : "...",
"attributes" : {
"property1" : { },
"property2" : { }
}
}, {
"typeName" : "...",
"attributes" : {
"property1" : { },
"property2" : { }
}
} ],
"typeName" : "...",
"attributes" : {
"property1" : { },
"property2" : { }
}
},
"property2" : {
"guid" : "...",
"status" : "DELETED",
"createdBy" : "...",
"updatedBy" : "...",
"createTime" : 12345,
"updateTime" : 12345,
"version" : 12345,
"classifications" : [ {
"typeName" : "...",
"attributes" : {
"property1" : { },
"property2" : { }
}
}, {
"typeName" : "...",
"attributes" : {
"property1" : { },
"property2" : { }
}
} ],
"typeName" : "...",
"attributes" : {
"property1" : { },
"property2" : { }
}
}
}
}
... View more
Labels:
- Labels:
-
Apache Atlas
06-18-2017
02:20 PM
Could someone help me understand when I would use the 'options' property? It's part of the AtlasBaseTypeDefs and has a type of 'map of string'. In the below example I can create an entity type, but I don't know how to use the 'options' property and under what circumstances I would want to. {
"enumDefs":[],
"structDefs":[],
"classificationDefs":[],
"entityDefs":[
{
"superTypes":[
"DataSet"
],
"name":"test_entity_3",
"description":"test_entity_3",
"createdBy": "admin",
"updatedBy": "admin",
"options" :
{
"property1" : "when_to_use_1",
"property2" : "when_to_use_2"
},
"attributeDefs":[
{
"name":"test_3",
"isOptional":true,
"isUnique":false,
"isIndexable":false,
"typeName":"string",
"valuesMaxCount":1,
"cardinality":"SINGLE",
"valuesMinCount":0
}
]
}
]
}
... View more
Labels:
- Labels:
-
Apache Atlas
06-12-2017
07:00 PM
1 Kudo
I'm looking at the documentation for HDP 2.6.1 and from what I can see, traits still use the legacy API rather than v2. Am I reading the documentation correctly? In section 'Atlas types' it states that composite metatypes are "Class, Struct, Trait". e.g. POST v2/types/classificationdef
POST /v2/types/structdef But v2 REST API doesn't have anything for 'trait' In section 'Cataloging Atlas Metadata: Traits and Business Taxonomy' we have: POST http://<atlas-server-host:port>/api/atlas/types So this is using the legacy API. Is this right? Will this change with future Atlas releases? Also, can someone provide a business use case for using classifications, structs and traits. I'm getting muddled between the circumstances I would use each. Many thanks!
... View more
Labels:
- Labels:
-
Apache Atlas
03-14-2017
12:42 AM
Does V2 ship with 0.8? What hdp release is that likely to be in? 2.6 or 3?
... View more
03-13-2017
06:49 PM
I'd like to define and create a new meta model in Atlas. I would like it to store a date and a value, but also inherit values from existing types: hive tables and dbs. I can create new simple types and entities, but I'm having problems getting the right behaviour for the hive tables and dbs. I when I create a simple type with 'superTypes' of Asset and Referenceable I'm able to create simple entities. But when I use DataSet I get errors. Here's a type I tried (I get get values and dates to work so this is just trying to get the hive_db attribute to work): {
"enumTypes": [],
"structTypes": [],
"traitTypes": [],
"classTypes": [
{
"superTypes": [
"DataSet"
],
"hierarchicalMetaTypeName": "org.apache.atlas.typesystem.types.ClassType",
"typeName": "test_type_class_22",
"typeDescription": null,
"attributeDefinitions": [
{
"dataTypeName": "hive_db",
"isComposite": false,
"isIndexable": true,
"isUnique": false,
"multiplicity": "required",
"name": "db",
"reverseAttributeName": null
}
]
}
]
}
But I don't know how to properly create values that point to the actual 'default' database. {
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference",
"id":{
"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Id",
"id":"-1566683608564093000",
"version":0,
"typeName":"test_type_class_22",
"state":"ACTIVE"
},
"typeName":"test_type_class_22",
"values":{
"name":"default"
"qualifiedName":"default@Sandbox",
"description":"test description",
"owner":null,
"db": "default"
},
"traitNames":[
],
"traits":{
}
}
For the value of 'db' (which is of datatype hive_db) I tried "default", "default@Sandbox", "default.test_type_class_22@sandbox" and "da862503-62f0-41a6-8acc-da4efcf856eb". I just don't know what I should be setting this to. Any ideas? All return the error similar to: {
"error": "Unable to deserialize json",
"stackTrace": "java.lang.IllegalArgumentException: Unable to deserialize json at org.apache.atlas.services.DefaultMetadataService.deserializeClassInstances(DefaultMetadataService.java:350) at org.apache.atlas.services.DefaultMetadataService.createEntities(DefaultMetadataService.java:323) at org.apache.atlas.web.resources.EntityResource.submit(EntityResource.java:133) at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at org.apache.atlas.web.filters.AuditFilter.doFilter(AuditFilter.java:71) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.apache.atlas.web.filters.AtlasAuthorizationFilter.doFilter(AtlasAuthorizationFilter.java:154) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.apache.atlas.web.filters.AtlasCSRFPreventionFilter$ServletFilterHttpInteraction.proceed(AtlasCSRFPreventionFilter.java:232) at org.apache.atlas.web.filters.AtlasCSRFPreventionFilter.handleHttpInteraction(AtlasCSRFPreventionFilter.java:177) at org.apache.atlas.web.filters.AtlasCSRFPreventionFilter.doFilter(AtlasCSRFPreventionFilter.java:187) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.apache.atlas.web.filters.AtlasAuthenticationFilter.doFilter(AtlasAuthenticationFilter.java:301) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.java:201) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:183) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:105) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:192) at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:160) at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346) at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:259) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.atlas.typesystem.IReferenceableInstance at org.apache.atlas.typesystem.types.ClassType.convert(ClassType.java:138) at org.apache.atlas.services.DefaultMetadataService.getTypedReferenceableInstance(DefaultMetadataService.java:365) at org.apache.atlas.services.DefaultMetadataService.deserializeClassInstances(DefaultMetadataService.java:342) ... 85 more "
}
... View more
Labels:
- Labels:
-
Apache Atlas
03-06-2017
10:05 PM
2 Kudos
I want to use Atlas traits and attributes to hold data quality metadata (counts and dates). I have multiple Hive tables and for each of them I run basic DQ scripts to count the number of anomalies for different DQ checks each day (at both table or column level). I only expect Atlas to hold the most recent date and count. Example of the sort of DQ metadata I generate: hive_table hive_column Load date DQ check DQ count table_1 - 2017-03-06 Count number of records 999 table_1 column_1 2017-03-06 Number of not nulls 2 table_1 column_2 2017-03-06 Number of inconsistent dates 0 table_2 - 2017-03-06 Count number of records 9999 table_2 column_1 2017-03-06 Number of not nulls 232 table_2 column_2 2017-03-06 Number of inconsistent dates 2 I have 2 questions. 1. What is the best way to structure the traits and attributes? Traits: dq_not_null; or dq_not_null_table_column_nn Attributes: dq_count; or table_column_dq_count If I were to update attribute values for a trait that is linked to 2 entities (hive_tables) can each value be updated separately, or will the attribute value be shared across the trait? If it is shared then I will need unique trait names (I think). 2. How should I update the attribute values (the values are generated from HQL scripts)? Here's an example of my traits and attributes (but not attribute values) for a DQ check for not nulls. {
"enumTypes":[],
"structTypes":[],
"traitTypes":[
{
"superTypes":[],
"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType",
"typeName":"dq_monitor_not_null",
"typeDescription":null,
"attributeDefinitions":[
{
"name":"dq_monitor_load_date",
"dataTypeName":"date",
"multiplicity":"optional",
"isComposite":false,
"isUnique":false,
"isIndexable":true,
"reverseAttributeName":null
},
{
"name":"dq_monitor_count",
"dataTypeName":"int",
"multiplicity":"optional",
"isComposite":false,
"isUnique":false,
"isIndexable":true,
"reverseAttributeName":null
}
]
}
],
"classTypes":[]
}
... View more
Labels:
- Labels:
-
Apache Atlas
05-16-2016
02:41 PM
1 Kudo
Please could you advise the correct way of setting up collection and set attributes in the Atlas REST API . I have used the following but I don't know if I've properly assigned the set or collection to the attribute as I only know how to check the attributes in the UI... but that's pretty basic in 0.6 so really I'd like to know how to do this in the backend. {
"enumTypes": [],
"structTypes": [],
"traitTypes": [
{
"superTypes": [],
"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType",
"typeName": "api_test_set",
"attributeDefinitions": [
{
"name": "set_test",
"dataTypeName": "array<string>",
"multiplicity": "set",
"isComposite": false,
"isUnique": false,
"isIndexable": true,
"reverseAttributeName": null
},
{
"name": "collection_test",
"dataTypeName": "array<string>",
"multiplicity": "collection",
"isComposite": false,
"isUnique": false,
"isIndexable": true,
"reverseAttributeName": null
}
]
}
],
"classTypes": []
}
... View more
Labels:
- Labels:
-
Apache Atlas