Created on 06-26-2015 03:35 PM - edited 09-16-2022 02:32 AM
Hi
I have enabled Sentry to work with HiveServer2 with Kerberos Authentication. Therefore, impersonication on HiveServer2 is turned off.
Now all queries are run as 'hive' from Hue Hive UI, and oozie hive action.
How does resource management (YARN resource pool) works in this case? I want jobs to go into right pool, but now all Hive jobs are going into root.hive pool.
Samething happens with Impala when using llma. All impala jobs goes into root.llama pool.
Thank you
Ben
Created 07-07-2015 12:27 AM
The fact that the job runs as the hive user is correct. You have impersonation turned off when you turned on Sentry, at least that is what you should have done. The Hive user is thus the user that executes the job.
However the end user should be used to retrieve which queue the application is submitted in (if you use the FairScheduler). This does require some configuration on your side to make this work. There is a Knowledge Base article in our support portal on how to set that up for CM and non CM clusters. Search for "Hive FairScheduler".
I can remember already providing the steps using CM before on the forum:
<property> <name>yarn.scheduler.fair.allocation.file</name> <value>/etc/hive/fsxml/fair-scheduler.xml</value> </property>
NOTE: you must have the follwoing rule as the first rule in the placement policy:
<rule name="specified" />
Wiflred
Created 06-26-2015 04:46 PM
Created 07-06-2015 06:17 AM
sorry for late response Darren,
I'm using CDH 5.4.1.
This doesn't happen from command-line. If I'm authenticated as ben on shell environment, then the job gets submitted as ben.
On hue+oozie environment, if I submit a workflow job, oozie job-launcher get's submitted as the authenticated user ben. However actual hive job gets submitted as hive user.
Thank you.
Ben
Created 07-06-2015 06:38 AM
what's the issue tracking URL on 5.2.1 release? can't find it on Google 😞
Created 07-06-2015 10:45 AM
Created 07-06-2015 04:04 PM
I tested on both hive and beeline, and running from command line works as intented. jobs get assigned to correct user/group queues.
Can you explain why it's ok for hive jobs get submitted as 'hive' user?
We have four different teams using Cloudera, and it gets difficult to manage resources if all hive jobs go to "root.hive" queue. And since "root.hive" queue has limited resouces allocated, most hive jobs will fail.
This is our job history.
application_1436195699910_0031 | hive | INSERT INTO TABLE ...(Stage-1) | MAPREDUCE | root.hive | Mon Jul 6 15:44:38 -0500 2015 | Mon Jul 6 15:45:11 -0500 2015 | FINISHED | SUCCEEDED | ||
application_1436195699910_0030 | ben | oozie:launcher:T=hive2:W=JobName:A=hive2-6df2:ID=0000004-150706101622653-oozie-oozi-W | MAPREDUCE | root.infra | Mon Jul 6 15:44:22 -0500 2015 | Mon Jul 6 15:45:21 -0500 2015 | FINISHED | SUCCEEDED |
other workflow actions such as sqoop/pig run on correct user/group queue.
I think this is problem with our cluster configuration, but please guide us with right direction 🙂
thank you for your help
Ben
Created 07-07-2015 12:27 AM
The fact that the job runs as the hive user is correct. You have impersonation turned off when you turned on Sentry, at least that is what you should have done. The Hive user is thus the user that executes the job.
However the end user should be used to retrieve which queue the application is submitted in (if you use the FairScheduler). This does require some configuration on your side to make this work. There is a Knowledge Base article in our support portal on how to set that up for CM and non CM clusters. Search for "Hive FairScheduler".
I can remember already providing the steps using CM before on the forum:
<property> <name>yarn.scheduler.fair.allocation.file</name> <value>/etc/hive/fsxml/fair-scheduler.xml</value> </property>
NOTE: you must have the follwoing rule as the first rule in the placement policy:
<rule name="specified" />
Wiflred
Created 07-09-2015 07:46 AM
Tara!
Thank very much for your help. Now I understand that the job runs as hive user but the job will go to the designated queue. And after following your steps it worked 🙂
Initially I changed Placement Rules on resource pools and did not have "specified" pool as first rule.
Do I need to replace the local /etc/hive/fsxml/fair-scheduler.xml everytime I make changes to the "Dynamic Resource Pools"? I'm using CM cluster.
Best,
Ben
Created 07-09-2015 11:08 AM
Created 12-09-2015 08:44 AM
Hi, we have a similar issue and wondering if those steps listed are the resolution.
we have our cluster kerberised and we also deployed Sentry, as part of the setup in hive we disabled impersonation. so all the HIVE queries are being executed by the HIVE user.
We configured Dynamic resource manager pools, setting up 3 queues. HighPriority, LowPriority and Default.
Everybody can submit jobs to the default queue, that is working as expected.
The HighPriority, LowPriority are managed by group membership to two different AD groups.
I assigned a test user both groups so it could submit jobs to both queues (HighPriority, LowPriority) when i submitted a job
we got the following error message
ERROR : Job Submission failed with exception 'java.io.IOException(Failed to run job : User hive cannot submit applications to queue root.HighPriority)'
java.io.IOException: Failed to run job : User hive cannot submit applications to queue root.HighPriority
this is correct because the hive user doesn't is not a member of any of those groups.
I modified the submission access control to add the hive user to the pool and this time the job completed, however that breaks the access control model we are trying to implement because now all hive users can make use of both pools even though they don't belong any of the AD groups that are supposed to be controlling who can submit jobs to the pool.
Is there a way to control which users can submit to specific resource pools in HIVE and leverage the Ad groups created for this purpose?