Support Questions

Find answers, ask questions, and share your expertise

spark-shell not getting launched - Queue's AM resource limit exceeded.

avatar
Expert Contributor

Hello - i've HDP 2.5.x and i'm trying to launch spark-shell .. ApplicationMaster gets launched, but YARN is not able to assign containers.

Command ->

./bin/spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m

Error ->

[Sun Aug 06 19:33:29 +0000 2017] Application is added to the scheduler and is not yet activated. Queue's AM resource limit exceeded. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>; Queue Resource Limit for AM = <memory:6144, vCores:1>; User AM Resource Limit of the queue = <memory:6144, vCores:1>; Queue AM Resource Usage = <memory:6144, vCores:3>;

Any ideas on what parameters to change ?

Pls note -> In YARN, the parameter - yarn.scheduler.capacity.maximum-am-resource-percent = 0.9 , AM should have access sufficient to assign container

7 REPLIES 7

avatar
Expert Contributor

@smanjee, @mqureshi, @Neeraj Sabharwal, - any ideas on this ?

avatar
Expert Contributor

This is what i see in the Ambari (YARN capacity scheduer config)

---------------

yarn.scheduler.capacity.root.queues=default,llap
yarn.scheduler.capacity.root.default.user-limit-factor=1 
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.maximum-capacity=60
yarn.scheduler.capacity.root.default.capacity=60
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.maximum-am-resource-percent=0.9
yarn.scheduler.capacity.default.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.llap.acl_administer_queue=hive yarn.scheduler.capacity.root.llap.acl_submit_applications=hive yarn.scheduler.capacity.root.llap.maximum-am-resource-percent=1 yarn.scheduler.capacity.root.llap.maximum-capacity=40 yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.llap.ordering-policy=fifo yarn.scheduler.capacity.root.llap.state=RUNNING yarn.scheduler.capacity.root.llap.user-limit-factor=1 yarn.scheduler.capacity.root.llap.capacity=40

avatar
Expert Contributor
@Karan Alang

Could you please share (at least the relevant part of) your Capacity Scheduler config and tell us how much memory your default queue in total should have? Based on your error your default queue's AM limit is in fact exceeded..

avatar
Expert Contributor

@gnovak - here is what i see in YARN Capacity Scheduler

yarn.scheduler.capacity.root.queues=default,llap
yarn.scheduler.capacity.root.default.user-limit-factor=1 
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.maximum-capacity=60
yarn.scheduler.capacity.root.default.capacity=60
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.maximum-am-resource-percent=0.9
yarn.scheduler.capacity.default.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.llap.acl_administer_queue=hive yarn.scheduler.capacity.root.llap.acl_submit_applications=hive yarn.scheduler.capacity.root.llap.maximum-am-resource-percent=1 yarn.scheduler.capacity.root.llap.maximum-capacity=40 yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.llap.ordering-policy=fifo yarn.scheduler.capacity.root.llap.state=RUNNING yarn.scheduler.capacity.root.llap.user-limit-factor=1 yarn.scheduler.capacity.root.llap.capacity=40

avatar
Expert Contributor
@Karan Alang

Based on this information I assume you have 12 GBs of memory and the minimum allocation is set to 1024 MB, the default queue has a configured capacity of 60%, 7 GBs. The AM limit is 6 GBs (7.2 * 0.9 rounded to GB), and it is full, probably three other AMs are running. Please correct me if I'm wrong!

To get more memory, you might try these things:

  • Add more memory to the cluster 😛
  • Increase the maximum-capacity of the default queue, so that it can use more resources when the LLAP queue doesn't use them
  • Increase the maximum-am-resource-percent of the default queue to 1
  • Decrease the minimum-allocation-mb: this way the other AMs (and containers) might use less resources (e.g. if you need 1.2 GBs - just for the sake of the example - then with the default 1 GB minimum allocation you still need to get a 2 GB container)
  • Kill other applications from the queue or wait until they finish

avatar
Expert Contributor

@gnovak - Thanks, how did you conclude that my cluster has 12 GB RAM .. i mean - what parameter indicates that ?

avatar
Expert Contributor

The default queue's AM limit is 6144 MB -> the default queue's capacity must be 7 GB (for 6 the limit would be 5 and for 8 it would be 7 with the maximum-am-resource-percent of 0.9). Since default.capacity = 60 the whole cluster's capacity equals to ~ 100 / 60 * 7, which could indicate 12 or 13 GBs in total, but the latter would be very unusual.

Did you manage to overcome your issue with any of my suggestions?