Created on 01-15-2016 07:58 AM - edited 08-19-2019 05:13 AM
When I allow user1 to read the col1 column in the table on Hive, I will add the following policy to Hive service in Ranger.
However, this is not enough in case "Run as end user instead of Hive user = true".
I have to add the policy to HDFS service in Ranger.
The following table shows the policies at each ACL layer.
In this case, user1 can access to the entire table data by hdfs command or hive command without hiveserver2.
I think that Ranger support column based ACL in case when "Run as end user instead of Hive user" is true.
Created 01-15-2016 09:38 AM
Hi @Junichi Oda - This is expected behaviour and it is the reason why it is recommended to have all hive processes run as hive user when you secure Hive with ranger.
There are two options in order to secure access to hive with Ranger :
Solution 1
Use both a repository HDFS and Hive to handle rights
Keep "run as end user instead of hive" (hive.server2.enable.doAs=true)
This means the dual maintenance that you describe
Solution 2
Give rights to the hive user on the /apps/hive/warehouse arborescence in Ranger HDFS repository
Lock down filesystem permissions on HDFS (for example, chmod 750)
Use the Ranger Hive repository to handle rights on Hive tables
Run as hive instead of end user (hive.server2.enable.doAs=false)
---
Solution 2 is the way to go. You may be concerned about auditability, but the Hive audits in Ranger will show the correct user. The HDFS audits and the YARN audits will still show "hive" yes, but you will be able to tell who ran the query.
Created 01-15-2016 09:38 AM
Hi @Junichi Oda - This is expected behaviour and it is the reason why it is recommended to have all hive processes run as hive user when you secure Hive with ranger.
There are two options in order to secure access to hive with Ranger :
Solution 1
Use both a repository HDFS and Hive to handle rights
Keep "run as end user instead of hive" (hive.server2.enable.doAs=true)
This means the dual maintenance that you describe
Solution 2
Give rights to the hive user on the /apps/hive/warehouse arborescence in Ranger HDFS repository
Lock down filesystem permissions on HDFS (for example, chmod 750)
Use the Ranger Hive repository to handle rights on Hive tables
Run as hive instead of end user (hive.server2.enable.doAs=false)
---
Solution 2 is the way to go. You may be concerned about auditability, but the Hive audits in Ranger will show the correct user. The HDFS audits and the YARN audits will still show "hive" yes, but you will be able to tell who ran the query.
Created 01-16-2016 12:23 PM
Thank you very much for your reply and very helpful solutions.
I'd rather not manage both a repository HDFS and Hive if I can avoid it.
However, we manage Hadoop resources by the YARN queue assigned to each user.
For this reason I would like to keep "run as end user instead of hive"(hive.server2.enable.doAs=true).