Created on 06-26-2015 03:35 PM - edited 09-16-2022 02:32 AM
Hi
I have enabled Sentry to work with HiveServer2 with Kerberos Authentication. Therefore, impersonication on HiveServer2 is turned off.
Now all queries are run as 'hive' from Hue Hive UI, and oozie hive action.
How does resource management (YARN resource pool) works in this case? I want jobs to go into right pool, but now all Hive jobs are going into root.hive pool.
Samething happens with Impala when using llma. All impala jobs goes into root.llama pool.
Thank you
Ben
Created 07-07-2015 12:27 AM
The fact that the job runs as the hive user is correct. You have impersonation turned off when you turned on Sentry, at least that is what you should have done. The Hive user is thus the user that executes the job.
However the end user should be used to retrieve which queue the application is submitted in (if you use the FairScheduler). This does require some configuration on your side to make this work. There is a Knowledge Base article in our support portal on how to set that up for CM and non CM clusters. Search for "Hive FairScheduler".
I can remember already providing the steps using CM before on the forum:
<property> <name>yarn.scheduler.fair.allocation.file</name> <value>/etc/hive/fsxml/fair-scheduler.xml</value> </property>
NOTE: you must have the follwoing rule as the first rule in the placement policy:
<rule name="specified" />
Wiflred
Created on 12-08-2017 02:24 AM - edited 12-08-2017 02:27 AM
Hi, we have the similar issue. I use CDH 5.6.0. Is this bug fixed? However, I have no idea about the right way to make it work.
I prefer to use the Placement Rules configurated in YARN instead of this workaround method.
And our situation is quit similar with Madhu's. I have setup 3 queues, and there are 3 groups in LDAP(these 3 groups are also in Linux OS). Is there a way to control which users can submit to specific resource pools in HIVE?
Thanks,
ywheel
Created 12-08-2017 03:35 AM
This has been fixed in later releases of Cloudera Manager and CDH. When you manage the cluster through CM the config, and changes later on, will be automatically deployed to hive server.
Also don't forget to make sure that hive user must have permission to submit to all queues. Simplest way is to add Hive to the root submit queue ACL.
Wilfred
Created 12-08-2017 03:48 AM
Thanks a lot!
Since which version of CDH and Cloudera Manager have this fixed feature? I'm on CHD 5.6.0, and seems it doesn't work yet.
Created 12-08-2017 03:50 AM
You must be on 5.8.0 or later for both CDH and CM
Wilfred
Created 12-08-2017 03:53 AM
Created 12-19-2017 05:46 PM
Hi, we have a similar issue with Madhu's, by the way, We are using CDH 5.12.0
The following is Madhu's describe:
we have our cluster kerberised and we also deployed Sentry, as part of the setup in hive we disabled impersonation. so all the HIVE queries are being executed by the HIVE user.
We configured Dynamic resource manager pools, setting up 3 queues. HighPriority, LowPriority and Default.
Everybody can submit jobs to the default queue, that is working as expected.
The HighPriority, LowPriority are managed by group membership to two different AD groups.
I assigned a test user both groups so it could submit jobs to both queues (HighPriority, LowPriority) when i submitted a job
we got the following error message
ERROR : Job Submission failed with exception 'java.io.IOException(Failed to run job : User hive cannot submit applications to queue root.HighPriority)'
java.io.IOException: Failed to run job : User hive cannot submit applications to queue root.HighPriority
this is correct because the hive user doesn't is not a member of any of those groups.
I modified the submission access control to add the hive user to the pool and this time the job completed, however that breaks the access control model we are trying to implement because now all hive users can make use of both pools even though they don't belong any of the AD groups that are supposed to be controlling who can submit jobs to the pool.
Is there a way to control which users can submit to specific resource pools in HIVE and leverage the Ad groups created for this purpose?
Created 12-19-2017 11:29 PM
The placement rules are executed as the original user. That means the job will be added to the correct pool. The end user can not override that because the mapred.job.queuename property should be blacklisted.
The hive user should never be accessible for any user, it is a service principal and allowing it to be used by end users will give you far bigger issues.
I thus do not see how adding hive as a user to the acl breaks it.
Wilfred
Created 12-20-2017 07:26 AM
Hi Wilfried,
I'm sorry to ask again, but i'm facing the same problem and I don't understand how to configure Dynamic Ressource Pool Configuration to work using orginal user groups (me not hive).
I'm using CDH 5.13 with Kerberos and Sentry. As I am using Sentry, impersonation is disabled.
My configuration is
root
|--A
|--B
On root, submission ACL are set to allow only "sentry" user to submit in this pool
On A, submission ACL are set to allow only group A to submit in this pool
On B, submission ACL are set to allow only group B to submit in this pool
Placement rules are :
1 - "Use the pool Specified at run time, only if the pool exists."
2 - "Use the pool root.[username] and create the pool if it does not exist. "
When I submit a query with a user from the group A, using Hue and setting "set mapred.job.queue.name=A;" I got the error : "User hive cannot submit applications to queue root.A"
If I add hive to allowed user on root, the query is working fine but both A and B user's can submit query
If I add hive to only "A" resource pool, then user from A and B group can submit query to ressource pool A, but none can submit to resource pool B
Maybe I am missing an important part, but I don't have the same behavior as you explained and if I add hive in authorized user it will break the ACL's as every user could use all the resource pool.
Can you give us the good configuration to have the same behavior as your's ?