I am learning atlas tag based policy but not able to fully understand atlas tag based policy complete purpose ,Please go through the below lines and help me to understand better :
Ranger traditionally provided group or user based authorization for resources such as table, column in Hive or a file in HDFS. With the new Atlas -Ranger integration, administrators can conceptualize security policies based on data classification, and not necessarily in terms of tables or columns. Data stewards can easily classify data in Atlas and use in the classification in Ranger to create security policies.
as in hortonworks atls tag based policy description i could understand raj_ops user was restricted to location and ssn column but later after creation of pII tag in atlas he is able to access on , but my query is if we want to give access to the user raj_ops then we can exclude his name from the ranger itself why do we go for creation of tag in atlas and its integration to atlas .
Please help me to understand above concepts .
Thanks in advance.
I'm going to use tag-based policies in the following way:
This is because if access is given or restricted at a column level, you also end up having an effect on the table/database too, giving undesired consequences - for me anyway.
In a multi-tenancy environment:
I can give tag-based access to all production data lake objects in tenancy_xxx to raj_ops. Later, I may want to give test staging and data lake access to holger_gov. Or I may create a new tenancy and want to give the same type of access to raj_ops and holger_gov. Doing this with tags is much simpler to control than RBAC.
If I have tagged all my objects with 3 tags then creating attributed based policies is trivial.
tag = environment (attribute name = name, type = string)
tag = data_zone (attribute name = name, type string)
tag = tenancy_xxx
Hi @Anurag Mishra: Let's take the following example:
An organization has an 'Employee' table in Hbase with 2 column_families: 'personal_info'(columns: name, age, gender, address) and 'prof_info'(columns: manager, role, salary).
There is also a 'Customers' table in Hive with following columns: cust_id, name, age, address, email, SSN.
User 'raj_ops' should not be allowed any type of accesses on any information related to customers except the cust_id and name and also not allowed to access any info in 'Employee' table that is deemed to be personally identifiable by the organization.
There are many many more databases/ tables/ resources that can contain such information and just like him ,there are many other users who are not allowed to see data of a certain 'type' or of a certain 'classification' like 'customer' data, 'PII' data, 'sensitive' data, 'classified' data etc. The data could be spread across different services (HDFS/Hive/Hbase etc), could be spread across various resources.
It would be overwhelming to keep track of each of such resources and create multiple policies as more and more of such data becomes available. That's where tags and tag based policies come in handy.
'personal_info' column-family in hbase table 'Employee' and columns 'age', 'address', 'email', 'SSN' of Hive table 'Customers' can be tagged by a tag 'Sensitive' and a tag based policy can be created on tag 'Sensitive', denying users like 'raj_ops' all Hbase and Hive operations on all resources tagged with 'Sensitive'.
Pls note that 1 single tag based policy on a tag gives users options to allow/restrict operations wrt all services like Hive, Hbase, Knox, Storm etc as the same tag can be on various resources like an Hbase column-family, a Hive table etc.
Hope this helps!