Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the role of Atlas Tag Attribute?

What is the role of Atlas Tag Attribute?

Super Guru

What is the role of Atlas Tag Attribute in context of business and technical. I want to understand how as tech & business user I would leverage this capablity. Also how does it apply in terms of ranger-atlas security tag integration?

2 REPLIES 2

Re: What is the role of Atlas Tag Attribute?

Guru

The tag serves to make the data searchable but also to govern types of data in a single place.

If you want to explore how to tag data check out this article : https://community.hortonworks.com/articles/57722/tag-hive-data-using-apache-atlas.html

For more info on the integration with Ranger you can look at this : http://hortonworks.com/hadoop-tutorial/tag-based-policies-atlas-ranger/

Re: What is the role of Atlas Tag Attribute?

Guru

@Sunile Manjee

Tag Attributes give you the ability to use tags for more than just data discovery and classification. As you know, tags are decorations applied to entities. Entities that are decorated are easily identified for what ever purpose the tag is meant. Attributes of Tag give you the ability to add additional information to any given instance of the tag.

Imagine that you were using Atlas to store meta data about ML models. You may have 5 different versions of a model with statistics about how accurate the mode is. Now imagine that you have some automated process that needs to know which version of the model should be published to production. You could decorate the model you want to use in production with a tag called "Publish". Now the process knows to publish the tagged model but it does not know where to publish it. This is where tag attributes can help. When you tag a model with the "Publish" tag, you can add an attribute to that tag called "HDFS Path" (the location where the dependent Spark process might look for it). Now the automated process knows which model to publish and where to publish it to. The attribute is important because while all instances of the "Publish" tag will have an attribute called "HDFS Path", each instance will also need to have a unique value for the "HDFS Path" attribute.