Community Articles

aromero · ‎03-22-2016

Introduction

Security best practices when using Ranger dictate that Hive jobs should ideally run as user 'hive' so that only Ranger Hive policies apply for end user access to data, and letting 'hive' own all the directory/file structure for Hive on HDFS. This is achieved by using hive.server2.enable.doAs set to 'false'. It also allows to improve performance as it enables container pre-warming for Tez, as it is only applicable for those jobs started by 'hive', and not by other end users.

Problem

The problem introduced by doAs = false is that, if YARN Capacity Scheduler queue mappings have been defined on a user/group basis, the mappings will not apply since all the jobs will be started as the same user (i.e. 'hive'), making the queue definitions completely useless.

Solution

One solution could be to use a Hive hook that could detect the real user that started the query so that we could submit the job to the right queue even if it still runs as user 'hive'. Then, the hook could find the list of groups the user belongs to and try to match them with a group-mappings file (with the structure groupname:queuename). When it finds one of the user groups it will automatically submit the job to the matched queue.

The Hive hook can be found in:

https://github.com/beto983/Hive-Utils

This Hive hook is able to detect the user that started the hive session, find the groups that it belongs to, and send the job to the corresponding queue depending on that group and the mappings we define on the group-mappings file.

It is based on this other hook which will submit the job to a queue named as the primary user's group:

https://github.com/gbraccialli/HiveUtils

Steps to follow:

On all HiveServer2 servers do: mkdir /usr/hdp/current/hive-client/auxlib/ && wget https://github.com/beto983/Hive-Utils/blob/master/Hive-Utils-1.0-jar-with-dependencies.jar -O /usr/hdp/current/hive-client/auxlib/Hive-Utils-1.0-jar-with-dependencies.jar
Add the following setting on hive-site.xml (Custom hiveserver2-site on Ambari): hive.semantic.analyzer.hook=com.github.beto983.hive.hooks.YARNQueueHook

Create a "group-mappings" file in /etc/hive/conf/ with the structure:

  groupname:queuename
  groupname:queuename
  groupname:queuename
  ...

Restart Hive

Cloudera Community

Community Articles

Map Hive jobs to YARN queues

Apache Hive

Apache YARN

Introduction

Problem

Solution

Steps to follow:

Yarn queues and CS view - Queue Mapping

Yarn queue Mapping

Setting up Yarn queue acls

Map Reduce job on YARN hangs in ACCEPTED state

Hive - Understanding concurrent sessions + queue a...

Yarn queues - No Capacity Scheduler view

Yarn Queue Utilization - Ambari Widget

Starting Spark jobs directly via YARN REST API

Setting yarn queue for hive with beeline

Unable to initialize hive / run a job due to "non-...