Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Contributor

Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the Big Data platform. Ranger allows you to centralize security administration to manage all security related tasks in a central UI or using REST APIs. You can setup fine grained authorization to do a specific action and/or operation with the platform's components/tools through role based and/or attribute based access control. Ranger also supports central auditing of all events across the cluster on access/edit/update/delete operations.

Ranger has support for the following components through plugins:

  • HDFS
  • HBase
  • Hive
  • YARN
  • KNOX
  • Storm
  • SOLR
  • Kafka
  • NiFi
  • NiFi-Registry
  • Atlas

This article will provide an example of setting up Ranger database policies as well as column masking rules. The example will also show what happens when you have two tables, with one table having column masking rules applied for a specific user, and trying to join the two tables.

Scenario:

  • Two users: user_sales, user_hr
  • Two tables: table_a, table_b
  • There is a common column between the two, allowing you to join the two tables together.
  • table_a has a column masking policy for user_sales, who is not allowed to see the values in this column.
  • We join the two tables together, and see the difference what user_sales see, and what user_hr sees

Ranger Policy Setup

In this screenshot, you can see that user_sales and user_hr have been given permission to access table_a and table_b:

ranger1.JPG

However, in the following screenshot, user_sales is being prevented from seeing the col_masked values by a masking policy in table_a:

ranger2.JPG

As user_hr, I can see both tables:

 

select * from table_a;
+---------------+---------------------+
| table_a.name  | table_a.col_masked  |
+---------------+---------------------+
| Mark          | 1                   |
| Jane          | 2                   |
| Steve         | 3                   |
+---------------+---------------------+

select * from table_b;
+-------------------+-----------------------+
| table_b.location  | table_b.col_unmasked  |
+-------------------+-----------------------+
| London            | 1                     |
| Amsterdam         | 2                     |
| Paris             | 3                     |
+-------------------+-----------------------+

 

user_hr can also join both tables together on the shared column without any issue:

 

select a.*, b.* from table_a a, table_b b where a.col_masked = b.col_unmasked;
+---------+---------------+-------------+-----------------+
| a.name  | a.col_masked  | b.location  | b.col_unmasked  |
+---------+---------------+-------------+-----------------+
| Mark    | 1             | London      | 1               |
| Jane    | 2             | Amsterdam   | 2               |
| Steve   | 3             | Paris       | 3               |
+---------+---------------+-------------+-----------------+

 

As user_sales, I'm allowed to see everything in table_b, but I'm not allowed to see the masked column in table_a:

 

select * from table_a;
+---------------+---------------------+
| table_a.name  | table_a.col_masked  |
+---------------+---------------------+
| Mark          | NULL                |
| Jane          | NULL                |
| Steve         | NULL                |
+---------------+---------------------+

select * from table_b;
+-------------------+-----------------------+
| table_b.location  | table_b.col_unmasked  |
+-------------------+-----------------------+
| London            | 1                     |
| Amsterdam         | 2                     |
| Paris             | 3                     |
+-------------------+-----------------------+

 

When I try and join the table, Hive will prevent the join from happening as the joined columns don't match up:

 

+---------+---------------+-------------+-----------------+
| a.name  | a.col_masked  | b.location  | b.col_unmasked  |
+---------+---------------+-------------+-----------------+
+---------+---------------+-------------+-----------------+

 

When running the join operation in Hive, Hive will apply the masking operation before performing the join, thus preventing the rows from matching each other and generating a result set.

 

1,295 Views