Support Questions

Find answers, ask questions, and share your expertise

sentry + hive + kerberos resource management

avatar
Rising Star

Hi

 

I have enabled Sentry to work with HiveServer2 with Kerberos Authentication. Therefore, impersonication on HiveServer2 is turned off.

Now all queries are run as 'hive' from Hue Hive UI, and oozie hive action.

 

How does resource management (YARN resource pool) works in this case? I want jobs to go into right pool, but now all Hive jobs are going into root.hive pool.

 

Samething happens with Impala when using llma. All impala jobs goes into root.llama pool.

 

Thank you

Ben

1 ACCEPTED SOLUTION

avatar
Super Collaborator

The fact that the job runs as the hive user is correct. You have impersonation turned off when you turned on Sentry, at least that is what you should have done. The Hive user is thus the user that executes the job.

However the end user should be used to retrieve which queue the application is submitted in (if you use the FairScheduler). This does require some configuration on your side to make this work. There is a Knowledge Base article in our support portal on how to set that up for CM and non CM clusters. Search for "Hive FairScheduler".

 

 

I can remember already providing the steps using CM before on the forum:

 

  1. Login to Cloudera Manager
  2. Navigate to Cluster > Yarn > Instances > ResourceManager > Processes
  3. Click on the link fair-scheduler.xml, this will open a new tab or window
  4. Copy the contents into the a new file called: fair-scheduler.xml
  5. On the HiveServer2 host create a new directory to store the xml file (for example, /etc/hive/fsxml)
    Note: This file should not be placed in the standard Hive configuration directory since that directory is managed by Cloudera Manager and the file could be removed when changing other configuration settings.
  6. Upload the fair-scheduler.xml file to the above created directory
  7. In Cloudera Manager navigate to Cluster > Hive > Service-Wide > Advanced > Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml and add the following property:
    <property>
      <name>yarn.scheduler.fair.allocation.file</name>
      <value>/etc/hive/fsxml/fair-scheduler.xml</value>
    </property>
  8. Save changes
  9. Restart the Hive Service

 NOTE: you must have the follwoing rule as the first rule in the placement policy:

<rule name="specified" />

 

Wiflred

View solution in original post

17 REPLIES 17

avatar
New Contributor

Hi, we have the similar issue. I use CDH 5.6.0. Is this bug fixed? However, I have no idea about the right way to make it work.

 

I prefer to use the Placement Rules configurated in YARN instead of this workaround method.

 

And our situation is quit similar with Madhu's. I have setup 3 queues, and there are 3 groups in LDAP(these 3 groups are also in Linux OS). Is there a way to control which users can submit to specific resource pools in HIVE?

 

Thanks,

ywheel

avatar
Super Collaborator

This has been fixed in later releases of Cloudera Manager and CDH. When you manage the cluster through CM the config, and changes later on, will be automatically deployed to hive server.

 

Also don't forget to make sure that hive user must have permission to submit to all queues. Simplest way is to add Hive to the root submit queue ACL.

 

Wilfred

avatar
New Contributor

Thanks a lot!

 

Since which version of CDH and Cloudera Manager have this fixed feature? I'm on CHD 5.6.0, and seems it doesn't work yet.

avatar
Super Collaborator

You must be on 5.8.0 or later for both CDH and CM

 

Wilfred

avatar
New Contributor
Got it! Thanks for your quick reply.

Best,
ywheel

avatar
New Contributor

Hi, we have a similar issue with Madhu's, by the way, We are using CDH 5.12.0

The following is Madhu's describe:

 

we have our cluster kerberised and we also deployed Sentry, as part of the setup in hive we disabled impersonation. so all the HIVE queries are being executed by the HIVE user.
We configured Dynamic resource manager pools, setting up 3 queues. HighPriority, LowPriority and Default.
Everybody can submit jobs to the default queue, that is working as expected.
The HighPriority, LowPriority are managed by group membership to two different AD groups.

I assigned a test user both groups so it could submit jobs to both queues (HighPriority, LowPriority) when i submitted a job
we got the following error message

ERROR : Job Submission failed with exception 'java.io.IOException(Failed to run job : User hive cannot submit applications to queue root.HighPriority)'
java.io.IOException: Failed to run job : User hive cannot submit applications to queue root.HighPriority

this is correct because the hive user doesn't is not a member of any of those groups.
I modified the submission access control to add the hive user to the pool and this time the job completed, however that breaks the access control model we are trying to implement because now all hive users can make use of both pools even though they don't belong any of the AD groups that are supposed to be controlling who can submit jobs to the pool.

Is there a way to control which users can submit to specific resource pools in HIVE and leverage the Ad groups created for this purpose?

avatar
Super Collaborator

The placement rules are executed as the original user. That means the job will be added to the correct pool. The end user can not override that because the mapred.job.queuename property should be blacklisted.

The hive user should never be accessible for any user, it is a service principal and allowing it to be used by end users will give you far bigger issues.

 

 

I thus do not see how adding hive as a user to the acl breaks it.

 

Wilfred

avatar
New Contributor

Hi Wilfried,

 

I'm sorry to ask again, but i'm facing the same problem and I don't understand how to configure Dynamic Ressource Pool Configuration to work using orginal user groups (me not hive).

I'm using CDH 5.13 with Kerberos and Sentry. As I am using Sentry, impersonation is disabled.

My configuration is 

root

|--A

|--B

On root, submission ACL are set to allow only "sentry" user to submit in this pool

On A, submission ACL are set to allow only group A to submit in this pool

On B, submission ACL are set to allow only group B to submit in this pool

Placement rules are :

1 - "Use the pool Specified at run time, only if the pool exists."

2 - "Use the pool root.[username] and create the pool if it does not exist. "

 

When I submit a query with a user from the group A, using Hue and setting "set mapred.job.queue.name=A;" I got the error : "User hive cannot submit applications to queue root.A"

 

If I add hive to allowed user on root, the query is working fine but both A and B user's can submit query

If I add hive to only "A" resource pool, then user from A and B group can submit query to ressource pool A, but none can submit to resource pool B

 

Maybe I am missing an important part, but I don't have the same behavior as you explained and if I add hive in authorized user it will break the ACL's as every user could use all the resource pool.

 

Can you give us the good configuration to have the same behavior as your's ?