Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ranger should support column based ACL in case "Run as end user instead of Hive user = true"

avatar
Rising Star

When I allow user1 to read the col1 column in the table on Hive, I will add the following policy to Hive service in Ranger.

1381-スクリーンショット-2016-01-15-165038.png

However, this is not enough in case "Run as end user instead of Hive user = true".

I have to add the policy to HDFS service in Ranger.

The following table shows the policies at each ACL layer.

1382-スクリーンショット-2016-01-15-165544.png

In this case, user1 can access to the entire table data by hdfs command or hive command without hiveserver2.

I think that Ranger support column based ACL in case when "Run as end user instead of Hive user" is true.

1 ACCEPTED SOLUTION

avatar

Hi @Junichi Oda - This is expected behaviour and it is the reason why it is recommended to have all hive processes run as hive user when you secure Hive with ranger.

There are two options in order to secure access to hive with Ranger :

Solution 1

Use both a repository HDFS and Hive to handle rights

Keep "run as end user instead of hive" (hive.server2.enable.doAs=true)

This means the dual maintenance that you describe

Solution 2

Give rights to the hive user on the /apps/hive/warehouse arborescence in Ranger HDFS repository

Lock down filesystem permissions on HDFS (for example, chmod 750)

Use the Ranger Hive repository to handle rights on Hive tables

Run as hive instead of end user (hive.server2.enable.doAs=false)

---

Solution 2 is the way to go. You may be concerned about auditability, but the Hive audits in Ranger will show the correct user. The HDFS audits and the YARN audits will still show "hive" yes, but you will be able to tell who ran the query.

View solution in original post

2 REPLIES 2

avatar

Hi @Junichi Oda - This is expected behaviour and it is the reason why it is recommended to have all hive processes run as hive user when you secure Hive with ranger.

There are two options in order to secure access to hive with Ranger :

Solution 1

Use both a repository HDFS and Hive to handle rights

Keep "run as end user instead of hive" (hive.server2.enable.doAs=true)

This means the dual maintenance that you describe

Solution 2

Give rights to the hive user on the /apps/hive/warehouse arborescence in Ranger HDFS repository

Lock down filesystem permissions on HDFS (for example, chmod 750)

Use the Ranger Hive repository to handle rights on Hive tables

Run as hive instead of end user (hive.server2.enable.doAs=false)

---

Solution 2 is the way to go. You may be concerned about auditability, but the Hive audits in Ranger will show the correct user. The HDFS audits and the YARN audits will still show "hive" yes, but you will be able to tell who ran the query.

avatar
Rising Star

Thank you very much for your reply and very helpful solutions.

I'd rather not manage both a repository HDFS and Hive if I can avoid it.

However, we manage Hadoop resources by the YARN queue assigned to each user.

For this reason I would like to keep "run as end user instead of hive"(hive.server2.enable.doAs=true).