Member since
12-14-2016
3
Posts
5
Kudos Received
0
Solutions
12-27-2016
10:56 AM
2 Kudos
One work around that I just tested - run the beeline with the following queue parameter: beeline -u "jdbc:hive2://local:10001/default;transportMode=http;httpPath=cliservice;principal=hive/_HOST@local.COM" -e "SELECT count(*) FROM log;" --hiveconf tez.queue.name=prd_am This will request the query to be executed in the prd_am queue. If the user has access to that queue allowed in Ranger, it will work fine. Still looking for a solution to use the default mapping defined in the YARN Capacity Scheduler configuration like: yarn.scheduler.capacity.queue-mappings=u:user1:dev_devs, g:devs:dev_devs
... View more
12-15-2016
08:51 AM
1 Kudo
Thanks for the response, I just ran this: beeline -u "jdbc:hive2://local:10001/default;transportMode=http;httpPath=cliservice;principal=hive/_HOST@local.COM" -e "SELECT count(*) FROM log;" --hiveconf mapreduce.job.queuename=root.prd_am It again went to prd_oper: Does not seem to be it. Somehow the mapreduce setting is overwriting the hive setting...
... View more
12-14-2016
03:22 PM
2 Kudos
Hi all,
I`m operating a 16 node Hortonworks (Teradata Apliance) cluster in a mid-size TelCo for few months now. We just completed an upgrade from Ambari 2.4 to 2.5 and updated all the Hadoop stack as well. The cluster is in secure mode using Kerberos and Ranger, and has a YARN Capacity Scheduler configured with the following configuration: yarn.scheduler.capacity.root.queues=prd_oper, prd_analyst, prd_am, dev_oper, dev_devs, tst_devs
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.default.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.tst_devs.user-limit-factor=1
yarn.scheduler.capacity.queue-mappings=u:user1:dev_devs
yarn.scheduler.capacity.root.dev_devs.acl_administer_jobs= *
yarn.scheduler.capacity.root.dev_devs.acl_administer_queue= *
yarn.scheduler.capacity.root.dev_devs.acl_submit_applications= *
yarn.scheduler.capacity.root.dev_devs.capacity=1
yarn.scheduler.capacity.root.dev_devs.maximum-capacity=90
yarn.scheduler.capacity.root.dev_devs.state=RUNNING
yarn.scheduler.capacity.root.dev_devs.user-limit-factor=1
yarn.scheduler.capacity.root.dev_oper.acl_administer_jobs= *
yarn.scheduler.capacity.root.dev_oper.acl_administer_queue= *
yarn.scheduler.capacity.root.dev_oper.acl_submit_applications= *
yarn.scheduler.capacity.root.dev_oper.capacity=1
yarn.scheduler.capacity.root.dev_oper.maximum-capacity=90
yarn.scheduler.capacity.root.dev_oper.state=RUNNING
yarn.scheduler.capacity.root.dev_oper.user-limit-factor=1
yarn.scheduler.capacity.root.prd_am.acl_administer_jobs= *
yarn.scheduler.capacity.root.prd_am.acl_administer_queue= *
yarn.scheduler.capacity.root.prd_am.acl_submit_applications= *
yarn.scheduler.capacity.root.prd_am.capacity=1
yarn.scheduler.capacity.root.prd_am.maximum-capacity=90
yarn.scheduler.capacity.root.prd_am.state=RUNNING
yarn.scheduler.capacity.root.prd_am.user-limit-factor=1
yarn.scheduler.capacity.root.prd_analyst.acl_administer_jobs= *
yarn.scheduler.capacity.root.prd_analyst.acl_administer_queue= *
yarn.scheduler.capacity.root.prd_analyst.acl_submit_applications= *
yarn.scheduler.capacity.root.prd_analyst.capacity=10
yarn.scheduler.capacity.root.prd_analyst.maximum-capacity=90
yarn.scheduler.capacity.root.prd_analyst.state=RUNNING
yarn.scheduler.capacity.root.prd_analyst.user-limit-factor=1
yarn.scheduler.capacity.root.prd_oper.acl_administer_jobs= *
yarn.scheduler.capacity.root.prd_oper.acl_administer_queue= *
yarn.scheduler.capacity.root.prd_oper.acl_submit_applications= *
yarn.scheduler.capacity.root.prd_oper.capacity=80
yarn.scheduler.capacity.root.prd_oper.maximum-capacity=90
yarn.scheduler.capacity.root.prd_oper.state=RUNNING
yarn.scheduler.capacity.root.prd_oper.user-limit-factor=1
yarn.scheduler.capacity.root.tst_devs.acl_administer_jobs= *
yarn.scheduler.capacity.root.tst_devs.acl_administer_queue= *
yarn.scheduler.capacity.root.tst_devs.acl_submit_applications= *
yarn.scheduler.capacity.root.tst_devs.capacity=7
yarn.scheduler.capacity.root.tst_devs.maximum-capacity=90
yarn.scheduler.capacity.root.tst_devs.state=RUNNING With the upgrade of Ambari, a new setting is now available (or at least enforced) in the MapReduce2 configuration: This now is setting the MapReduce2 queue to prd_oper, which is a valid queue as defined in the settings above. Running any map-reduce job will go to that queue. PROBLEM: All users will always try to use the prd_oper queue as defined in the above property. Even if you try to overwrite it with a setting like --hiveconf mapred.job.queuename=prd_am it will still go to prd_oper - i.e. the queue defined in the setting above. This used to work fine before the upgrade when this option was not defined. I could control the queue mapping of each user/group within the Capacity Scheduler settings and I could post map/reduce jobs to any query that I need. I can`t remove this property via Ambari as it is mandatory, nor I can change it directly in mapred-site.xml as it gets overwriten by Ambari. In contrast, Spark allows publishing to any query: I need to restore the queueu mapping to what it use to be before the Upgrade. Any help will be appreciated!
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop