- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Ranger should support column based ACL in case "Run as end user instead of Hive user = true"
- Labels:
-
Apache Ranger
Created on ‎01-15-2016 07:58 AM - edited ‎08-19-2019 05:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I allow user1 to read the col1 column in the table on Hive, I will add the following policy to Hive service in Ranger.
However, this is not enough in case "Run as end user instead of Hive user = true".
I have to add the policy to HDFS service in Ranger.
The following table shows the policies at each ACL layer.
In this case, user1 can access to the entire table data by hdfs command or hive command without hiveserver2.
I think that Ranger support column based ACL in case when "Run as end user instead of Hive user" is true.
Created ‎01-15-2016 09:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Junichi Oda - This is expected behaviour and it is the reason why it is recommended to have all hive processes run as hive user when you secure Hive with ranger.
There are two options in order to secure access to hive with Ranger :
Solution 1
Use both a repository HDFS and Hive to handle rights
Keep "run as end user instead of hive" (hive.server2.enable.doAs=true)
This means the dual maintenance that you describe
Solution 2
Give rights to the hive user on the /apps/hive/warehouse arborescence in Ranger HDFS repository
Lock down filesystem permissions on HDFS (for example, chmod 750)
Use the Ranger Hive repository to handle rights on Hive tables
Run as hive instead of end user (hive.server2.enable.doAs=false)
---
Solution 2 is the way to go. You may be concerned about auditability, but the Hive audits in Ranger will show the correct user. The HDFS audits and the YARN audits will still show "hive" yes, but you will be able to tell who ran the query.
Created ‎01-15-2016 09:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Junichi Oda - This is expected behaviour and it is the reason why it is recommended to have all hive processes run as hive user when you secure Hive with ranger.
There are two options in order to secure access to hive with Ranger :
Solution 1
Use both a repository HDFS and Hive to handle rights
Keep "run as end user instead of hive" (hive.server2.enable.doAs=true)
This means the dual maintenance that you describe
Solution 2
Give rights to the hive user on the /apps/hive/warehouse arborescence in Ranger HDFS repository
Lock down filesystem permissions on HDFS (for example, chmod 750)
Use the Ranger Hive repository to handle rights on Hive tables
Run as hive instead of end user (hive.server2.enable.doAs=false)
---
Solution 2 is the way to go. You may be concerned about auditability, but the Hive audits in Ranger will show the correct user. The HDFS audits and the YARN audits will still show "hive" yes, but you will be able to tell who ran the query.
Created ‎01-16-2016 12:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your reply and very helpful solutions.
I'd rather not manage both a repository HDFS and Hive if I can avoid it.
However, we manage Hadoop resources by the YARN queue assigned to each user.
For this reason I would like to keep "run as end user instead of hive"(hive.server2.enable.doAs=true).
