Created on 09-25-2016 06:13 PM - edited 08-17-2019 09:44 AM
Release - HDP2.5
Background:
Before this feature was released, Ranger allowed User to create only access policy for hive, that can provide security till column level, but folks might be interested in enabling cell level security then this is the solution they should opt.
Introduction:
This is a newly introduced feature,allows to create new type of ranger-hive policies that help administrator to restrict User from accessing some specific rows in table based on the filter condition in policy or mask the data that is sensitive , this masking can be full or partial.
Lets go in details for both policy type:
Row Filter policy:
Row filter policy allows ranger to specify filter expression in the hive policy, so that users sees only some specifc rows in the table that belong to him. eg: if user belong to US, and query the employee table and we want to restrict him to see only those employee who belong to US, then filter expression will be location = 'US'. there is a new tab we can see HDP2.5 onwards on hive policy page for 'Row filter', we need to provide database and table name in resource, and in condition enter the row filter expression that need to be used for filtering out the result of the hive query run by user part of row filter policy condition. in filter expression must be a valid WHERE clause , even inner condition are accepted.
-Note: 1) there is no column option provide in this policy type because it does not make sense for it.
Column Masking Policy:
Column masking policy allows ranger to specify masking condition in hive policy to mask the sensitive data for specific users, eg. in Bank account no. & cvv is sensitive data of a customer, now in ranger you can create masking policy to mask a column data partial or full for specific user or group.there is a new tab we can see HDP2.5 onwards on hive policy page for 'column masking', we need to provide database,table and column name in the resource and in condition select the masking condition.
following masking conditions are supported currently:
1) Redact:
2) Partial Mask: show last 4
3) partial Mask: show first 4
4) Hash
5) Nullify
6) Unmasked(retain original value)
7) Date: show only year
😎 custom
-Note:
1) wildcards are not allowed in both of the policies type
2) these policies can be created on table or view both.
3) Mask and filter are evaluated while execution of query based on the order they are listed in the policy.
Example:
Now lets take a example and try a row filter and each column masking technique: lets say we have a table called "customer" in "Bank" database:
----+--------------------+--+ | customer.id | customer.name | customer.account | customer.cvv | customer.dob | customer.location | +--------------+----------------+-------------------+---------------+---------------+--------------------+--+ | 432 | Amit | 898981931313131 | 432 | 1975-04-01 | Delhi | | 493 | John | 79898193128931 | 234 | 1985-09-11 | Bangalore | | 683 | nisar | 69598193128931 | 121 | 1965-09-11 | Bangalore | | 532 | rohan | 198981931313131 | 402 | 1995-04-01 | Delhi | | 400 | Rahul | 69898193128931 | 159 | 1985-09-10 | Bangalore | | 809 | nisar | 59598193128931 | 096 | 1979-09-11 | Bangalore |
Lets create row filter and colum masking policy, and run "select * from customer;" and we will see the difference in results:
1) Row filter policy example: create a row filter policy with filter expression: location = 'Bangalore' for user1
a) result if query executed by user1, "select * from customer;":
+--------------+----------------+-------------------+---------------+---------------+--------------------+--+ | customer.id | customer.name | customer.account | customer.cvv | customer.dob | customer.location | +--------------+----------------+-------------------+---------------+---------------+--------------------+--+ | 493 | John | 79898193128931 | 234 | 1985-09-11 | Bangalore | | 683 | nisar | 69598193128931 | 121 | 1965-09-11 | Bangalore | | 400 | Rahul | 69898193128931 | 159 | 1985-09-10 | Bangalore | | 809 | nisar | 59598193128931 | 096 | 1979-09-11 | Bangalore | +--------------+----------------+-------------------+---------------+---------------+--------------------+--+ 4 rows selected (0.864 seconds)
b) result if query executed by user2, "select * from customer;", since it is not part of policy so it will get all the results:
+--------------+----------------+-------------------+---------------+---------------+--------------------+--+ | customer.id | customer.name | customer.account | customer.cvv | customer.dob | customer.location | +--------------+----------------+-------------------+---------------+---------------+--------------------+--+ | 432 | Amit | 898981931313131 | 432 | 1975-04-01 | Delhi | | 493 | John | 79898193128931 | 234 | 1985-09-11 | Bangalore | | 683 | nisar | 69598193128931 | 121 | 1965-09-11 | Bangalore | | 532 | rohan | 198981931313131 | 402 | 1995-04-01 | Delhi | | 400 | Rahul | 69898193128931 | 159 | 1985-09-10 | Bangalore | | 809 | nisar | 59598193128931 | 096 | 1979-09-11 | Bangalore | +--------------+----------------+-------------------+---------------+---------------+--------------------+--+
2) column masking policy example: create a column masking policy on table customer, column account with masking condition: location = 'Bangalore' for user1
a) result if query executed by user1, "select * from customer;" , since it is part of policy so it will get masked account number and will show only last 4 numbers in it:
+--------------+----------------+-------------------+---------------+---------------+--------------------+--+ | customer.id | customer.name | customer.account | customer.cvv | customer.dob | customer.location | +--------------+----------------+-------------------+---------------+---------------+--------------------+--+ | 432 | Amit | xxxxxxxxxxx3131 | 432 | 1975-04-01 | Delhi | | 493 | John | xxxxxxxxxx8931 | 234 | 1985-09-11 | Bangalore | | 683 | nisar | xxxxxxxxxx8931 | 121 | 1965-09-11 | Bangalore | | 532 | rohan | xxxxxxxxxxx3131 | 402 | 1995-04-01 | Delhi | | 400 | Rahul | xxxxxxxxxx8931 | 159 | 1985-09-10 | Bangalore | | 809 | nisar | xxxxxxxxxx8931 | 096 | 1979-09-11 | Bangalore | +--------------+----------------+-------------------+---------------+---------------+--------------------+--+ 6 rows selected (0.841 seconds)
b) result if query executed by user2, "select * from customer;", since it is not part of policy so it will get unmasked results:
+--------------+----------------+-------------------+---------------+---------------+--------------------+--+ | customer.id | customer.name | customer.account | customer.cvv | customer.dob | customer.location | +--------------+----------------+-------------------+---------------+---------------+--------------------+--+ | 432 | Amit | 898981931313131 | 432 | 1975-04-01 | Delhi | | 493 | John | 79898193128931 | 234 | 1985-09-11 | Bangalore | | 683 | nisar | 69598193128931 | 121 | 1965-09-11 | Bangalore | | 532 | rohan | 198981931313131 | 402 | 1995-04-01 | Delhi | | 400 | Rahul | 69898193128931 | 159 | 1985-09-10 | Bangalore | | 809 | nisar | 59598193128931 | 096 | 1979-09-11 | Bangalore | +--------------+----------------+-------------------+---------------+---------------+--------------------+--+ 6 rows selected (0.649 seconds)
same way we can try out other masking types too.
There are some good use cases listed on the following wiki page please refer it too:
please comment for any question.
Created on 02-02-2017 09:39 PM
To add to Deepak's excellent explanation:
1. Hive Row Level Filter and Column Masking policies do not grant users/ groups the access to resources. You need an access level policy granting at least 'select' privilege to the required user/ group on the resources which will be used in Row Level/ Column Masking policies.
2. In a row level filter policy for a certain DB and table, there might be 2 filters set for a user 'user1' - one explicitly with user='user1' and one implicitly with group = 'public' which user1 is part of. Both the filters will NOT get applied to user1. The filter which appears first will be the one to get applied.
So if there is a policy which has filter for a group, and if there are certain users in group which the filters should not apply to -> there should be filter rows for those users containing NO Row Level Filter, BEFORE the group filter row.
"While determining the filter to apply for a table, Apache Ranger policy engine evaluates the policy-items in the order listed in the policy. The filter specified in the first policy-item that matches the access-request (i.e. user/groups) will be used in the query."
3. Under column masking policies flow: For each column that needs a masking condition, you will need to create a separate policy, even for same user/ group. All of these will be applied for a user/ group when select request comes in.
4. Row level and Column masking policies both work in tandem, they are not mutually exclusive to each other.
Created on 06-20-2017 01:21 PM
While doing masking am getting 1111... instead of cross xxxx...
any suggestion to fix it please