About external

external · ‎12-27-2016

One work around that I just tested - run the beeline with the following queue parameter: beeline -u "jdbc:hive2://local:10001/default;transportMode=http;httpPath=cliservice;principal=hive/_HOST@local.COM" -e "SELECT count(*) FROM log;" --hiveconf tez.queue.name=prd_am This will request the query to be executed in the prd_am queue. If the user has access to that queue allowed in Ranger, it will work fine. Still looking for a solution to use the default mapping defined in the YARN Capacity Scheduler configuration like: yarn.scheduler.capacity.queue-mappings=u:user1:dev_devs, g:devs:dev_devs

external · ‎12-15-2016

Thanks for the response, I just ran this: beeline -u "jdbc:hive2://local:10001/default;transportMode=http;httpPath=cliservice;principal=hive/_HOST@local.COM" -e "SELECT count(*) FROM log;" --hiveconf mapreduce.job.queuename=root.prd_am It again went to prd_oper: Does not seem to be it. Somehow the mapreduce setting is overwriting the hive setting...

external · ‎12-14-2016

Hi all, I`m operating a 16 node Hortonworks (Teradata Apliance) cluster in a mid-size TelCo for few months now. We just completed an upgrade from Ambari 2.4 to 2.5 and updated all the Hadoop stack as well. The cluster is in secure mode using Kerberos and Ranger, and has a YARN Capacity Scheduler configured with the following configuration: yarn.scheduler.capacity.root.queues=prd_oper, prd_analyst, prd_am, dev_oper, dev_devs, tst_devs yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.acl_administer_queue=* yarn.scheduler.capacity.root.accessible-node-labels=* yarn.scheduler.capacity.node-locality-delay=40 yarn.scheduler.capacity.maximum-applications=10000 yarn.scheduler.capacity.maximum-am-resource-percent=0.2 yarn.scheduler.capacity.default.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.tst_devs.user-limit-factor=1 yarn.scheduler.capacity.queue-mappings=u:user1:dev_devs yarn.scheduler.capacity.root.dev_devs.acl_administer_jobs= * yarn.scheduler.capacity.root.dev_devs.acl_administer_queue= * yarn.scheduler.capacity.root.dev_devs.acl_submit_applications= * yarn.scheduler.capacity.root.dev_devs.capacity=1 yarn.scheduler.capacity.root.dev_devs.maximum-capacity=90 yarn.scheduler.capacity.root.dev_devs.state=RUNNING yarn.scheduler.capacity.root.dev_devs.user-limit-factor=1 yarn.scheduler.capacity.root.dev_oper.acl_administer_jobs= * yarn.scheduler.capacity.root.dev_oper.acl_administer_queue= * yarn.scheduler.capacity.root.dev_oper.acl_submit_applications= * yarn.scheduler.capacity.root.dev_oper.capacity=1 yarn.scheduler.capacity.root.dev_oper.maximum-capacity=90 yarn.scheduler.capacity.root.dev_oper.state=RUNNING yarn.scheduler.capacity.root.dev_oper.user-limit-factor=1 yarn.scheduler.capacity.root.prd_am.acl_administer_jobs= * yarn.scheduler.capacity.root.prd_am.acl_administer_queue= * yarn.scheduler.capacity.root.prd_am.acl_submit_applications= * yarn.scheduler.capacity.root.prd_am.capacity=1 yarn.scheduler.capacity.root.prd_am.maximum-capacity=90 yarn.scheduler.capacity.root.prd_am.state=RUNNING yarn.scheduler.capacity.root.prd_am.user-limit-factor=1 yarn.scheduler.capacity.root.prd_analyst.acl_administer_jobs= * yarn.scheduler.capacity.root.prd_analyst.acl_administer_queue= * yarn.scheduler.capacity.root.prd_analyst.acl_submit_applications= * yarn.scheduler.capacity.root.prd_analyst.capacity=10 yarn.scheduler.capacity.root.prd_analyst.maximum-capacity=90 yarn.scheduler.capacity.root.prd_analyst.state=RUNNING yarn.scheduler.capacity.root.prd_analyst.user-limit-factor=1 yarn.scheduler.capacity.root.prd_oper.acl_administer_jobs= * yarn.scheduler.capacity.root.prd_oper.acl_administer_queue= * yarn.scheduler.capacity.root.prd_oper.acl_submit_applications= * yarn.scheduler.capacity.root.prd_oper.capacity=80 yarn.scheduler.capacity.root.prd_oper.maximum-capacity=90 yarn.scheduler.capacity.root.prd_oper.state=RUNNING yarn.scheduler.capacity.root.prd_oper.user-limit-factor=1 yarn.scheduler.capacity.root.tst_devs.acl_administer_jobs= * yarn.scheduler.capacity.root.tst_devs.acl_administer_queue= * yarn.scheduler.capacity.root.tst_devs.acl_submit_applications= * yarn.scheduler.capacity.root.tst_devs.capacity=7 yarn.scheduler.capacity.root.tst_devs.maximum-capacity=90 yarn.scheduler.capacity.root.tst_devs.state=RUNNING With the upgrade of Ambari, a new setting is now available (or at least enforced) in the MapReduce2 configuration: This now is setting the MapReduce2 queue to prd_oper, which is a valid queue as defined in the settings above. Running any map-reduce job will go to that queue. PROBLEM: All users will always try to use the prd_oper queue as defined in the above property. Even if you try to overwrite it with a setting like --hiveconf mapred.job.queuename=prd_am it will still go to prd_oper - i.e. the queue defined in the setting above. This used to work fine before the upgrade when this option was not defined. I could control the queue mapping of each user/group within the Capacity Scheduler settings and I could post map/reduce jobs to any query that I need. I can`t remove this property via Ambari as it is mandatory, nor I can change it directly in mapred-site.xml as it gets overwriten by Ambari. In contrast, Spark allows publishing to any query: I need to restore the queueu mapping to what it use to be before the Upgrade. Any help will be appreciated!

Online	Offline
Last Visited	‎12-27-2016 10:56 AM

Member Since	‎12-14-2016 03:20 PM
Last Visited	‎12-27-2016 10:56 AM
Posts	3
Kudos received	5

Cloudera Community

Re: Map Reduce forces mapreduce.job.queuename as a...

Re: Map Reduce forces mapreduce.job.queuename as a...

Map Reduce forces mapreduce.job.queuename as a man...