Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to update Atlas trait attribute values

Highlighted

How to update Atlas trait attribute values

Contributor

I want to use Atlas traits and attributes to hold data quality metadata (counts and dates).

I have multiple Hive tables and for each of them I run basic DQ scripts to count the number of anomalies for different DQ checks each day (at both table or column level). I only expect Atlas to hold the most recent date and count.

Example of the sort of DQ metadata I generate:

hive_tablehive_columnLoad dateDQ checkDQ count
table_1-2017-03-06Count number of records999
table_1column_12017-03-06Number of not nulls2
table_1column_22017-03-06Number of inconsistent dates0
table_2-2017-03-06Count number of records9999
table_2column_12017-03-06Number of not nulls232
table_2column_22017-03-06Number of inconsistent dates2

I have 2 questions.

1. What is the best way to structure the traits and attributes?

Traits:

  • dq_not_null; or
  • dq_not_null_table_column_nn

Attributes:

  • dq_count; or
  • table_column_dq_count

If I were to update attribute values for a trait that is linked to 2 entities (hive_tables) can each value be updated separately, or will the attribute value be shared across the trait? If it is shared then I will need unique trait names (I think).

2. How should I update the attribute values (the values are generated from HQL scripts)?

Here's an example of my traits and attributes (but not attribute values) for a DQ check for not nulls.

{
"enumTypes":[],
"structTypes":[],
"traitTypes":[
{
"superTypes":[],
"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType",
"typeName":"dq_monitor_not_null",
"typeDescription":null,
"attributeDefinitions":[
{
"name":"dq_monitor_load_date",
"dataTypeName":"date",
"multiplicity":"optional",
"isComposite":false,
"isUnique":false,
"isIndexable":true,
"reverseAttributeName":null
},
  {
"name":"dq_monitor_count",
"dataTypeName":"int",
"multiplicity":"optional",
"isComposite":false,
"isUnique":false,
"isIndexable":true,
"reverseAttributeName":null
}
]
}
],
"classTypes":[]
}
2 REPLIES 2

Re: How to update Atlas trait attribute values

Guru

Each Atlas Tag can have multiple Attributes name/value pairs. If you had a tag with attribute called owner, you could tag 2 hive tables using the tag and then update each table to have different values.

ex.

Tag1 --> Hive Table 1

Owner = user1

Tag1 --> Hive Table 2

Owner = user2

Is this what you are asking?

Hope this is helpful.

Re: How to update Atlas trait attribute values

Guru

Each Atlas Tag can have multiple Attributes name/value pairs. If you had a tag with attribute called owner, you could tag 2 hive tables using the tag and then update each table to have different values.

ex.

Tag1 --> Hive Table 1

Owner = user1

Tag1 --> Hive Table 2

Owner = user2

Is this what you are asking?

Hope this is helpful.