Reply
Expert Contributor
Posts: 162
Registered: ‎09-29-2014

hive impersonation and sentry

[ Edited ]

HiveServer2 impersonation must be turned off. HiveServer2 impersonation lets users execute queries and access HDFS files as the connected user rather than as the super user. Access policies are applied at the file level using the HDFS permissions specified in ACLs (access control lists). Enabling HiveServer2 impersonation bypasses Sentry from the end-to-end authorization process. Specifically, although Sentry enforces access control policies on tables and views within the Hive warehouse, it does not control access to the HDFS files that underlie the tables. This means that users without Sentry permissions to tables in the warehouse may nonetheless be able to bypass Sentry authorization checks and execute jobs and queries against tables in the warehouse as long as they have permissions on the HDFS files supporting the table.

 

 

 

the above text is from document, i just wonder why "Enabling HiveServer2 impersonation bypasses Sentry from the end-to-end authorization process" ? who can give some advises ? thanks.

Highlighted
Cloudera Employee
Posts: 724
Registered: ‎03-23-2015

Re: hive impersonation and sentry

The point of sentry is to only allow users with specific permission to access certain things. To do that, sentry needs to manage everything by itself, not by end users.

This is why we need to make the hive warehouse to be owned by "hive:hive" and 771, so that no end users can modify anything that hive and sentry control.

Enabling impersonation will make the end user to create files owned by them, which makes "hive" user not able to manage those files/directories and user has direct access to them. That will break what sentry is designed to do.

Hope above makes sense.

Cheers
New Contributor
Posts: 2
Registered: ‎05-09-2019

Re: hive impersonation and sentry

But this problem makes things tougher frm the admins perspective, jobs submitted from Hue is running as Hive user on Yarn..
Also most of the users will be creating external tables for thier work n store it at thier respective hdfs path, so setting the path ownership as user:usergrp is prohibitting the "disabled impersonated" hive user from hue,, to unable to write at the mentioned path....

So everytime have to set acl for everyone?..
and every sub directory ownership will change?..
What if the user if running on beeline?.. so still change the path ownership to hive:hive?..
Cloudera Employee
Posts: 724
Registered: ‎03-23-2015

Re: hive impersonation and sentry

Hi Sona,

>>> But this problem makes things tougher frm the admins perspective, jobs submitted from Hue is running as Hive user on Yarn..

The jobs will be submitted under queue that is configured in the cluster, so resources can still be controlled based on the end users, not "hive" user.

>>> Also most of the users will be creating external tables for thier work n store it at thier respective hdfs path, so setting the path ownership as user:usergrp is prohibitting the "disabled impersonated" hive user from hue,, to unable to write at the mentioned path....

All HDFS path that you store data for Hive databases/tables should be owned by "hive" and the permissions for end users should be done via Sentry HDFS sentry, by granting permissions to end users via Sentry and ACL will be synced to HDFS. So everything is managed by hive/sentry, and hive/sentry can give permissions to end users.

>>> So everytime have to set acl for everyone?..
You can setup at DB level, so no need to set it for every table

>>> and every sub directory ownership will change?..
Yes

>>> What if the user if running on beeline?.. so still change the path ownership to hive:hive?..

After enabling Sentry, you should have switched to beeline already, Hive CLI is deprecated and will not work properly in Sentry enabled environment.

Hope above helps.

Cheers
Eric
New Contributor
Posts: 2
Registered: ‎05-09-2019

Re: hive impersonation and sentry

Hi Eric ,

Thanks for the reply,
(1) In the resource pool, submission access control is set by "groupname", so when user from the group submitting a job through HUE, the Yarn is showing me the username as "hive" whom submitted the job, only upon the job is completed I could view who was the one submitted the job. Also if its a Spark job or Other huge jobs im unable to alert the user, before killing the job, which is very tough to monitor. So how to clearly see who submitted the Job?. when its showing hive everywhere.

 

(2) Hive databases is stored in /user/hive/warehouse/db*, but yet, users are creating tables as *external table* in thier own HDFS path /Project/Alpha/Table/..and in that path users are devided by *dev*sit*prd and etc. Besides just external tables, other files also stored at the same path, so are you suggesting me to leave the setting as hive:hive everywhere and let the "sentry role" to define who access what.?..


Regards,
Sona