Community Articles

dvergar · ‎10-14-2022

Apache Ranger is a powerful tool to manage authorization policies across the entire Hadoop ecosystem. It enforces security for services like HDFS, Hive, Hbase, Kafka, etc.. but also for Object Storage like Apache Ozone, Amazon S3 and Azure ADSL.

By default, Ranger’s policies are matched against the username or the groups the requesting user belongs to, but there are some situations where users information should be taken from other data sources, like for example Active Directory extended attributes or external RDBMS. Here is where Dynamic Policy Hooks come in handy, allowing Ranger’s administrator to extend the capabilities of the software, enriching the information passed to the authorization engine and making decisions applying a custom logic.

Writing a Dynamic Policy Hook is as simple as extending two abstract classes provided by Ranger’s core: RangerAbstractContextEnricher and RangerAbstractConditionEvaluator.

Use case

Let’s imagine a use case in which we have a group of users named “datascientist” who needs to access a table in Hive which contains customers’ data. Users in this group can select all fields, but for privacy reasons the column containing the last name must be masked except for data scientists who are tagged with a specific role “dpo”. The tag associated with the user is stored in an external Mysql database, so the Ranger plugin should extract this information before applying the policy.

The visibility cone is the following:

Users belonging to the “datascientist” group in Active Directory should have “select” privileges on the customers table, but the “c_last_name” column must be masked
Users belonging to the “datascientist” group in Active Directory and are also tagged as “dpo” can access the whole table, including viewing the “c_last_name” column in clear

Implementation

First we have to install our CDP PvC Base Cluster, with users davide, bob and alice belonging to the datascientist group:

#> id davide
uid=39971(davide) gid=39974(davide) groups=39974(davide),39972(datascientist)
#> id bob
uid=39974(bob) gid=39977(bob) groups=39977(bob),39972(datascientist)
#> id alice
uid=39975(alice) gid=39978(alice) groups=39978(alice),39971(developer),39972(datascientist)

Then, create Ranger policies to allow data scientists to access the customer table, with column c_last_name masked:

On Ranger, select the Hadoop SQL repository and create a new access policy:

Policy Name -> tpcds

Database -> tpcds_bin_partitioned_orc_5

Table -> customer

Column -> *

And add the following Allow Condition:

Select Group -> datascientist

Permission -> select

Now, let's create the Masking policy:

Policy Name -> customer_last_name_masking

Database -> tpcds_bin_partitioned_orc_5

Table -> customer

Column -> c_last_name

And add the following Masking Condition:

Select Group -> datascientist

Permission -> select

Select Masking Options -> Hash

Let’s check if our policies work:

#> kinit davide
Password for davide@CLOUDERA.LOCAL:
#> beeline -e "select current_groups()as groups, c_last_name from tpcds_bin_partitioned_orc_5.customer limit 10" 2>/dev/null
+-----------------------------+---------------------------------+
|           groups            |           c_last_name           |
+-----------------------------+---------------------------------+
| ["davide","datascientist"]  | 9c2d2409e88bb1351f690444e56c81  |
| ["davide","datascientist"]  | c87bd09bf1868b0222e3ed7627069a  |
| ["davide","datascientist"]  | fac0505ae34b0e78b3a459606a24c3  |
| ["davide","datascientist"]  | 1fc30d54c262e5502e1bff63e36420  |
| ["davide","datascientist"]  | 721e5108bbf13544019ece1061c239  |
| ["davide","datascientist"]  | 7569c3f1c51ef6c897c45b082f17c6  |
| ["davide","datascientist"]  | 1b9c9f7f6877bfc15011e05df96067  |
| ["davide","datascientist"]  | e42cacd863ebf10c6f425874b8eaf7  |
| ["davide","datascientist"]  | 635f445b280626d97c548917fc5bc0  |
| ["davide","datascientist"]  | 07908afd166b63b627eb616c6232de  |
+-----------------------------+---------------------------------+
#> kinit bob
Password for bob@CLOUDERA.LOCAL:
#> beeline -e "select current_groups()as groups, c_last_name from tpcds_bin_partitioned_orc_5.customer limit 10" 2>/dev/null
+--------------------------+---------------------------------+
|          groups          |           c_last_name        |
+--------------------------+---------------------------------+
| ["bob","datascientist"]  | 9c2d2409e88bb1351f690444e56c81  |
| ["bob","datascientist"]  | c87bd09bf1868b0222e3ed7627069a  |
| ["bob","datascientist"]  | fac0505ae34b0e78b3a459606a24c3  |
| ["bob","datascientist"]  | 1fc30d54c262e5502e1bff63e36420  |
| ["bob","datascientist"]  | 721e5108bbf13544019ece1061c239  |
| ["bob","datascientist"]  | 7569c3f1c51ef6c897c45b082f17c6  |
| ["bob","datascientist"]  | 1b9c9f7f6877bfc15011e05df96067  |
| ["bob","datascientist"]  | e42cacd863ebf10c6f425874b8eaf7  |
| ["bob","datascientist"]  | 635f445b280626d97c548917fc5bc0  |
| ["bob","datascientist"]  | 07908afd166b63b627eb616c6232de  |
+--------------------------+---------------------------------+
#> kinit alice
Password for alice@CLOUDERA.LOCAL:
#>  beeline -e "select current_groups()as groups, c_last_name from tpcds_bin_partitioned_orc_5.customer limit 10" 2>/dev/null
+----------------------------------------+---------------------------------+
|                 groups                 |           c_last_name          |
+----------------------------------------+---------------------------------+
| ["alice","developer","datascientist"]  | 9c2d2409e88bb1351f690444e56c81  |
| ["alice","developer","datascientist"]  | c87bd09bf1868b0222e3ed7627069a  |
| ["alice","developer","datascientist"]  | fac0505ae34b0e78b3a459606a24c3  |
| ["alice","developer","datascientist"]  | 1fc30d54c262e5502e1bff63e36420  |
| ["alice","developer","datascientist"]  | 721e5108bbf13544019ece1061c239  |
| ["alice","developer","datascientist"]  | 7569c3f1c51ef6c897c45b082f17c6  |
| ["alice","developer","datascientist"]  | 1b9c9f7f6877bfc15011e05df96067  |
| ["alice","developer","datascientist"]  | e42cacd863ebf10c6f425874b8eaf7  |
| ["alice","developer","datascientist"]  | 635f445b280626d97c548917fc5bc0  |
| ["alice","developer","datascientist"]  | 07908afd166b63b627eb616c6232de  |
+----------------------------------------+---------------------------------+

So, our policies work as expected for all users in the datascientist groups.

Now let’s implement our custom logic. As a first step, let’s create a MariaBD database which will hosts our tags:

#> yum install -y mariadb-server
…
Installed:
  mariadb-server.x86_64 1:5.5.68-1.el7

Dependency Installed:
  libaio.x86_64 0:0.3.109-13.el7               perl-DBD-MySQL.x86_64 0:4.023-6.el7               perl-DBI.x86_64 0:1.627-4.el7               perl-Net-Daemon.noarch 0:0.48-5.el7               perl-PlRPC.noarch 0:0.2020-14.el7
Complete!
#> systemctl start mariadb
#> mysql -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 5.5.68-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> create user 'user'@'localhost' identified by 'password';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> grant all privileges on repository.* to 'user'@'localhost';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)

then create the “usertags” table, with just two columns (one for the username, the other with the list of tags separated by commas). User davide has also the dpo tag:

#> mysql -u user -ppassword -h localhost repository
MariaDB [repository]> create table `usertags` (`user` varchar(20) not null primary key, `taglist` varchar(100));
Query OK, 0 rows affected (0.00 sec)

MariaDB [repository]> insert into `usertags` (`user`, `taglist`) values ('davide', 'dpo,analyst'), ('bob','analyst'), ('alice', NULL);
Query OK, 3 rows affected (0.00 sec)

Records: 3  Duplicates: 0  Warnings: 0

MariaDB [repository]> select * from `usertags`;
+--------+-------------+
| user   | taglist     |
+--------+-------------+
| alice  | NULL        |
| bob    | analyst     |
| davide | dpo,analyst |
+--------+-------------+
3 rows in set (0.00 sec)

After the DB is created it’s time to define our custom policies, using Ranger’s Dynamic Policy Hooks. To do so, we have to extend the following classes and implement a couple of methods.

The first class we have to extend is RangerAbstractContextEnricher. In this class we have to override method enrich where we add some custom code to get user’s tags information:

public class RangerCustomTagEnricher extends RangerAbstractContextEnricher {
…
@Override
public void enrich(RangerAccessRequest rangerAccessRequest) {
…
  if (rangerAccessRequest != null && cacheUsersTag != null) {
    Map<String, Object> context = rangerAccessRequest.getContext();
    if (rangerAccessRequest.getUser() != null) {
      String[] tags = cacheUsersTag.get(rangerAccessRequest.getUser());
      if (context != null && tags != null) {
        rangerAccessRequest.getContext().put(contextName, tags);
      }
    }
  }
}
…
}

Once we have added this information to the context of the request, it is time to evaluate it in the isMatched method of the class that extends RangerAbstractConditionEvaluater .

In this case, since we don’t want to apply the masking policy if the user is tagged as dpo, we return false if we match the value the value set in the policy, true otherwise

public class RangerCustomTagEvaluator extends RangerAbstractConditionEvaluator{
…
public boolean isMatched(RangerAccessRequest rangerAccessRequest) {
  boolean ret = true;
  if (_allowAny) {
    ret = false;
  } else {
    String[] requestValue = (String[])   rangerAccessRequest.getContext().get(_contextName);
    if (requestValue != null) {
      for (String policyValue: _values) {
        if (ArrayUtils.contains(requestValue,policyValue)) {
          ret = false;
          break;
        }
      }
    }
   }
return ret;
}

The full code of both classes can be found here.

Once the code is compiled and a jar file is created, we have to put this jar into the hive classpath. In CDP we can copy the jar into the /opt/cloudera/parcels/CDH/lib/hive/lib/ranger-hive-plugin-impl/ directory and restart HiveServer2.

We will then register this Context Enricher and the Condition Evaluator using the Ranger API. We can first retrieve the service definition via a GET request to the Hive service definition:

#> curl -u admin:password123 -X GET https://$(hostname -f):6182/service/public/v2/api/servicedef/name/hive | tee hive_service.json

Then modify the hive_service.json file adding the following entries and pushing it back:

#> vi hive_service.json
…
"policyConditions": [
  {
    "itemId": 1,
    "name": "USERTAG",
    "evaluator": "com.dvergari.ranger.evaluator.RangerCustomTagEvaluator",
    "evaluatorOptions": {
      "attributeName": "USERTAG"
    },
    "label": "Exclude from condition",
    "description": "Exclude Tag"
  }
],
"contextEnrichers": [
  {
    "itemId": 1,
    "name": "TagEnricher",
    "enricher": "com.dvergari.ranger.enricher.RangerCustomTagEnricher",
    "enricherOptions": {
      "dbUrl": "jdbc:mysql://localhost/repository",
      "username": "user",
        "password": "password",
        "table": "usertags"
    }
  }
],
…

 #> curl -u admin:password123 -X PUT --data @hive_service.json -i -H 'Content-Type: application/json' https://$(hostname -f):6182/service/public/v2/api/servicedef/name/hive

In the enricherOptions field we added the information to connect to our MariaDB database.

If we connect again to Ranger, we can now see a new field “Policy Condition” where we can specify the tag for which not to apply the masking policy (dpo in our example):

Click on save and come back to beeline to check if it is working:

#> kinit bob
Password for bob@CLOUDERA.LOCAL:
#> beeline -e "select current_groups()as groups, c_last_name from tpcds_bin_partitioned_orc_5.customer limit 10" 2>/dev/null
+--------------------------+---------------------------------+
|          groups          |           c_last_name           |
+--------------------------+---------------------------------+
| ["bob","datascientist"]  | 9c2d2409e88bb1351f690444e56c81  |
| ["bob","datascientist"]  | c87bd09bf1868b0222e3ed7627069a  |
| ["bob","datascientist"]  | fac0505ae34b0e78b3a459606a24c3  |
| ["bob","datascientist"]  | 1fc30d54c262e5502e1bff63e36420  |
| ["bob","datascientist"]  | 721e5108bbf13544019ece1061c239  |
| ["bob","datascientist"]  | 7569c3f1c51ef6c897c45b082f17c6  |
| ["bob","datascientist"]  | 1b9c9f7f6877bfc15011e05df96067  |
| ["bob","datascientist"]  | e42cacd863ebf10c6f425874b8eaf7  |
| ["bob","datascientist"]  | 635f445b280626d97c548917fc5bc0  |
| ["bob","datascientist"]  | 07908afd166b63b627eb616c6232de  |
+--------------------------+---------------------------------+
#> kinit alice
Password for alice@CLOUDERA.LOCAL:
#> beeline -e "select current_groups()as groups, c_last_name from tpcds_bin_partitioned_orc_5.customer limit 10" 2>/dev/null
+----------------------------------------+---------------------------------+
|                 groups                 |           c_last_name          |
+----------------------------------------+---------------------------------+
| ["alice","developer","datascientist"]  | 9c2d2409e88bb1351f690444e56c81  |
| ["alice","developer","datascientist"]  | c87bd09bf1868b0222e3ed7627069a  |
| ["alice","developer","datascientist"]  | fac0505ae34b0e78b3a459606a24c3  |
| ["alice","developer","datascientist"]  | 1fc30d54c262e5502e1bff63e36420  |
| ["alice","developer","datascientist"]  | 721e5108bbf13544019ece1061c239  |
| ["alice","developer","datascientist"]  | 7569c3f1c51ef6c897c45b082f17c6  |
| ["alice","developer","datascientist"]  | 1b9c9f7f6877bfc15011e05df96067  |
| ["alice","developer","datascientist"]  | e42cacd863ebf10c6f425874b8eaf7  |
| ["alice","developer","datascientist"]  | 635f445b280626d97c548917fc5bc0  |
| ["alice","developer","datascientist"]  | 07908afd166b63b627eb616c6232de  |
+----------------------------------------+---------------------------------+
#> kinit davide
Password for davide@CLOUDERA.LOCAL:
#> beeline -e "select current_groups()as groups, c_last_name from tpcds_bin_partitioned_orc_5.customer limit 10" 2>/dev/null
+-----------------------------+-----------------------------------+
|           groups            |           c_last_name           |
+-----------------------------+-----------------------------------+
| ["davide","datascientist"]  | Moran                           |
| ["davide","datascientist"]  | Shipman                         |
| ["davide","datascientist"]  | Gilbert                         |
| ["davide","datascientist"]  | Williams                        |
| ["davide","datascientist"]  | Grimes                          |
| ["davide","datascientist"]  | Baker                           |
| ["davide","datascientist"]  | Hernandez                       |
| ["davide","datascientist"]  | Woods                           |
| ["davide","datascientist"]  | Peterson                        |
| ["davide","datascientist"]  | Fisher                          |
+-----------------------------+-----------------------------------+

As we can see now, also if bob, alice and davide belong to the same datascientist group, davide can now see the c_last_name column in clear, while bob and alice can’t.

Now let’s tag alice as dpo too:

#> mysql -u user -h localhost -ppassword repository
MariaDB [repository]> update usertags set taglist = 'dpo' where user = 'alice';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

And try again to select from hive:

#> kinit alice
Password for alice@CLOUDERA.LOCAL:
#> beeline -e "select current_groups()as groups, c_last_name from tpcds_bin_partitioned_orc_5.customer limit 10" 2>/dev/null
+----------------------------------------+---------------------------------+
|                 groups                 |           c_last_name          |
+----------------------------------------+---------------------------------+
| ["alice","developer","datascientist"]  | Moran                              |
| ["alice","developer","datascientist"]  | Shipman                            |
| ["alice","developer","datascientist"]  | Gilbert                            |
| ["alice","developer","datascientist"]  | Williams                           |
| ["alice","developer","datascientist"]  | Grimes                             |
| ["alice","developer","datascientist"]  | Baker                              |
| ["alice","developer","datascientist"]  | Hernandez                          |
| ["alice","developer","datascientist"]  | Woods                              |
| ["alice","developer","datascientist"]  | Peterson                           |
| ["alice","developer","datascientist"]  | Fisher                             |
+----------------------------------------+---------------------------------+

Now we can see also Alice has the right to select the c_last_name column in clear.

Cloudera Community

Community Articles

Dynamic Policy Hooks in CDP

Apache Ranger

Use case

Implementation

Dynamic Allocation in Apache Spark

Customizing Ranger Policies with Dynamic Context

HDP to CDP - Ranger policies export and import

spark.sql.sources.partitionOverwriteMode=dynamic" ...

NiFi Ranger based policy descriptions

Hive on tez cannot execute custom hook program!!!

How to : Correctly configuring Apache Hive Hook fo...

Schedule Invoking HTTP dynamically - Nifi

Monitor a Replication Policy details on a Datalake...

Dynamic List Filtering with NiFi