Support Questions

Find answers, ask questions, and share your expertise

How to manage Hive warehouse HDFS directory permission?

avatar
Contributor

I have a HDP-2.6 cluster. I would like to control access to Hive tables through Ranger. I would also like to run my queries as an end-user. I followed HDP documentation of Ranger and set up 000 permission for directory /apps/warehouse/hive.

What I noticed while working is, Ranger policies doesn't solely work on policies created for Hive(database and tables). Though if a user has WRITE permission defined in Ranger policy, it still needs WRITE permission for the corresponding table's directory in HDFS. If my database has 1000+ tables and a user needs WRITE permission only for 200 tables, then I have to create ranger HDFS policy(s) for those 200 directories with WRITE permission to the user.

I can give WRITE permission at a database level however, I am worried about a possibility for user removes files of other tables from command line.

7 REPLIES 7

avatar
Expert Contributor

You should set the permission of Hive warehouse as 700 instead of 000, so that normal users are unable to access the secured tables, and let Ranger control the Hive policies.

In addition, you will need to make sure 'hive.warehouse.subdir.inherit.perms=true', that will enforce the newly created tables inherit the 700 permission.

Hope that helps.

avatar
Contributor

Thanks @dsun for quick reply. The question was posted multiple times and I am unable to remove it.

I changed hive warehouse directory permission and added hive.warehouse.subdir.inherit.perms. The user has all permission for Hive. I am unable to create database. Please suggest.

20488-hive-error.png

avatar
Expert Contributor

Did you set up proper Hive resource access policies for the users/groups in Ranger? Here is a good totorial on how to set them up in Ranger:

https://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/

avatar
Contributor

@dsun

I followed the link and set up ranger policies(scenario-2 run as end user) correctly. I am able to create Hive database/table only when if I have write permission to the warehouse directory(HDFS policy).

I understand, Hive is just on top of HDFS file system and it does not own the directory. But what I am expecting is Ranger has to deduce file permission based on the level of permission the user has for Hive.

"If my database has 1000+ tables and a user needs WRITE permission only for 200 tables, then I have to create ranger HDFS policy(s) for those 200 directories with WRITE permission to the user."

avatar
Contributor

Ideally Ranger should take care of HDFS file permission based on the level of permission a user has on Hive tables.

Thanks @dsun yeah, we can disable doAs and let hive user have required permission. However, I would like to enable HDFS encryption for hive warehouse database directories where only few user will have access to the encryption key. If I disable doAs then I need to give access to all keys to Hive user. I would like to enforce this to each end user.

I able to achieve it through Ranger Tag based policies where I define Hive and HDFS permission. Currently, there is no synchronization happens for managed Hive table hdfs path to Atlas. I created a custom hook for this. It is easy to manage policies and permissions.

avatar
Expert Contributor

One approach you can take is to enable Hive impersonation - set ‘hive.server2.enable.doAs=false’ in Hive Configs, which will give permissions of the Hive related HDFS folders to the ‘hive’ user, and other users wouldn’t be able to access HDFS files directly.

In your case, I assume you have doAs set to true, the user running the Hive query requires to have permissions defined for both HDFS and Hive in Ranger, which can be an issue if you have too many tables, as all your tables are managed under the hive/warehouse directory rather than user’s home folders, and for each table you will need to grant user permissions via HDFS policy in Ranger to the table location for the specific tables.

Even you have ‘doAs’ set to true, you will still be able to see the actual user in Ranger Audit logs, and it’s just the HDFS related tasks will run as the ‘hive’ user.

20489-ambari-scregionde.png

avatar
Expert Contributor

Please don't forget to 'accept' the answer if it helped, thanks.