After Ranger installation and enabled hive plugin for Ranger, one of the configuration it modified was set to hive.server2.enable.doAs=false. Right now all the jobs are running it as "hive" users. What is the reason it was recommended to change it to FALSE.
When we try to drop the table , it is throwing permission error even though we are logged in as "dwuser" but the Ranger considering it as "hive" user.
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.security.AccessControlException: Permission denied: user=hive, access=WRITE, inode="/data/insight/dwuser":dwuser:dwuser:drwxr-xr-x
Is it any impact if we are changing it back to TRUE ?
When doAs is set to false, then all queries are executed as the hive user. The hive user does not have access to the location where the table is located, /data/insights/dwuser. That is why you are seeing that message. You should be able to set doAs back to true without too many issues.
Because of the way that Hive interacts with HDFS, when doAs is set to true the user running the Hive query needs to have permissions defined properly in both HDFS as well as Hive via Ranger. This is typically not a problem for tables that are stored in user home directories as your example table. However when those tables are managed by Hive in the /user/hive/warehouse directory, you need to remember to grant user rights via HDFS to the table location for the specific tables. You often don't want to grant wide-open permissions to /user/hive/warehouse.
Thanks for your reply. It was set to TRUE before Ranger installation. But during the Ranger installation that property was set to FALSE and it was recommended by Ambari. What was the reason it was set to FALSE ? Do you have any insight on that ? Also set to FALSE also leads to other problem of resource allocation in YARN etc etc..
I would be happy if someone could answer what is the impact if I changed it to TRUE.
Thanks in advance..
I'm not 100% positive why Ambari recommends setting to FALSE. As I indicated above, it is likely because of the extra Ranger polices that you need to create and manage for HDFS in addition to the Hive policies. These extra policies are not intuitive to users and it can generate a lot of confusion about why some access works and not others.
Setting it to TRUE will give you finer grained access control and auditing. It also ensure better resource management via YARN queues. The only major impact to setting it to true is this:
1. If you need to manage column level security in Hive by restricting columns, you still have to ensure the user has HDFS access to the data. The downside is the user now has HDFS access to the data which doesn't have any column level restrictions allowing the user to get access to data via HDFS that they may not have access to via Hive.
2. If you are not concerned with column-level restrictions, then there are no downsides to using doAs set to TRUE that I'm aware of.
3. To get proper YARN queue mapping, you need to set doAs to TRUE.
As an alternative, you can use a custom Hive hook to submit a username to get proper YARN Queue utilization. You can read more here: https://community.hortonworks.com/content/idea/9658/hive-support-capacity-scheduler-user-queue-mappi...
With Ranger enabled, we ensure the hive.server2.enable.doAs is set to “false” because permissions in the HDFS files related to Hive can be given only to “hive” users, and noone would be able to access HDFS files directly.
After issuing a hive query, if you check the Ranger Audit logs you will be able to see that the query is running as the original user (dwuser) while the related tasks in HDFS will be executed as the “hive” user.
This describes the different use cases and why you would want to have that set to false or true. http://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/