Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Contributor

Introduction

Security best practices when using Ranger dictate that Hive jobs should ideally run as user 'hive' so that only Ranger Hive policies apply for end user access to data, and letting 'hive' own all the directory/file structure for Hive on HDFS. This is achieved by using hive.server2.enable.doAs set to 'false'. It also allows to improve performance as it enables container pre-warming for Tez, as it is only applicable for those jobs started by 'hive', and not by other end users.

Problem

The problem introduced by doAs = false is that, if YARN Capacity Scheduler queue mappings have been defined on a user/group basis, the mappings will not apply since all the jobs will be started as the same user (i.e. 'hive'), making the queue definitions completely useless.

Solution

One solution could be to use a Hive hook that could detect the real user that started the query so that we could submit the job to the right queue even if it still runs as user 'hive'. Then, the hook could find the list of groups the user belongs to and try to match them with a group-mappings file (with the structure groupname:queuename). When it finds one of the user groups it will automatically submit the job to the matched queue.

The Hive hook can be found in:

https://github.com/beto983/Hive-Utils

This Hive hook is able to detect the user that started the hive session, find the groups that it belongs to, and send the job to the corresponding queue depending on that group and the mappings we define on the group-mappings file.

It is based on this other hook which will submit the job to a queue named as the primary user's group:

https://github.com/gbraccialli/HiveUtils

Steps to follow:

  1. On all HiveServer2 servers do: mkdir /usr/hdp/current/hive-client/auxlib/ && wget https://github.com/beto983/Hive-Utils/blob/master/Hive-Utils-1.0-jar-with-dependencies.jar -O /usr/hdp/current/hive-client/auxlib/Hive-Utils-1.0-jar-with-dependencies.jar
  2. Add the following setting on hive-site.xml (Custom hiveserver2-site on Ambari): hive.semantic.analyzer.hook=com.github.beto983.hive.hooks.YARNQueueHook
  3. Create a "group-mappings" file in /etc/hive/conf/ with the structure:
      groupname:queuename
      groupname:queuename
      groupname:queuename
      ...
    
  4. Restart Hive
4,362 Views