Created 12-03-2018 01:51 PM
HI,
I have one node manager and using just 5.4% of the queue; however, tez jobs always stay in accepted status. The following error is showing in the diagnostics tab. I don;t have any issues with Spark Jobs.
Application is added to the scheduler and is not yet activated. Queue's AM resource limit exceeded. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:50176, vCores:1>; Queue Resource Limit for AM = <memory:35840, vCores:1>; User AM Resource Limit of the queue = <memory:35840, vCores:1>; Queue AM Resource Usage = <memory:5120, vCores:1>;
I have increased the am resource on scheduler.capacity to 40% and even 60% but still no luck. The config of capacity scheduler is below, Any thoughts?
yarn.scheduler.capacity.maximum-am-resource-percent=0.6 yarn.scheduler.capacity.maximum-applications=30 yarn.scheduler.capacity.node-locality-delay=40 yarn.scheduler.capacity.queue-mappings-override.enable=false yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator yarn.scheduler.capacity.root.Training.acl_submit_applications=* yarn.scheduler.capacity.root.Training.capacity=60 yarn.scheduler.capacity.root.Training.maximum-am-resource-percent=0.4 yarn.scheduler.capacity.root.Training.maximum-capacity=100 yarn.scheduler.capacity.root.Training.minimum-user-limit-percent=10 yarn.scheduler.capacity.root.Training.ordering-policy=fifo yarn.scheduler.capacity.root.Training.priority=1 yarn.scheduler.capacity.root.Training.state=RUNNING yarn.scheduler.capacity.root.Training.user-limit-factor=1 yarn.scheduler.capacity.root.accessible-node-labels=* yarn.scheduler.capacity.root.acl_administer_queue=* yarn.scheduler.capacity.root.acl_submit_applications=* yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.acl_administer_queue=* yarn.scheduler.capacity.root.default.acl_submit_applications=* yarn.scheduler.capacity.root.default.capacity=20 yarn.scheduler.capacity.root.default.maximum-capacity=20 yarn.scheduler.capacity.root.default.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.default.ordering-policy=fifo yarn.scheduler.capacity.root.default.priority=0 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.user-limit-factor=1 yarn.scheduler.capacity.root.llap.acl_administer_queue=* yarn.scheduler.capacity.root.llap.acl_submit_applications=* yarn.scheduler.capacity.root.llap.capacity=20 yarn.scheduler.capacity.root.llap.maximum-capacity=20 yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.llap.priority=0 yarn.scheduler.capacity.root.llap.state=RUNNING yarn.scheduler.capacity.root.llap.user-limit-factor=1 yarn.scheduler.capacity.root.maximum-capacity=100 yarn.scheduler.capacity.root.ordering-policy=priority-utilization yarn.scheduler.capacity.root.priority=0 yarn.scheduler.capacity.root.queues=Training,default,llap
I would really appreciate your help.
M.J
Created 12-05-2018 06:38 PM
Hi @Mahsa Jan,
If you go to the resourcemanager --> click on the <app ID> -> click on Logs --> click on syslog
what do you see, can you share the output?
Regards,
AQ
Created 12-06-2018 03:05 AM
Hi Aquilodran,
Please find the following error under diagnostic tab. These is no error under logs.
Scheduler Key | Resource Name | Capability | # Containers | Relax Locality | Node Label Expression |
---|---|---|---|---|---|
0 | * | <Memory:73728;vCores:> | 1 | true | N/A |
Allocated Resource | Running Containers | Preempted Resource | Num Non-AM container preempted | Num AM container preempted | Aggregated Resource Usage |
---|---|---|---|---|---|
0 MBs, 0 VCores | 0 | 0 MBs, 0 VCores | 0 | 0 | 0 MBs, 0 VCores (× Secs) |
Hi Aquilodran,
I see this under dignostices. No data under logs.
Scheduler Key | Resource Name | Capability | # Containers | Relax Locality | Node Label Expression |
---|---|---|---|---|---|
0 | * | <Memory:73728;vCores:> | 1 | true | N/A |
Allocated Resource | Running Containers | Preempted Resource | Num Non-AM container preempted | Num AM container preempted | Aggregated Resource Usage |
---|---|---|---|---|---|
0 MBs, 0 VCores | 0 | 0 MBs, 0 VCores | 0 | 0 | 0 MBs, 0 VCores (× Secs) |
Created 12-06-2018 03:06 AM
And this is my latest capacity scheduler config
yarn.scheduler.capacity.maximum-am-resource-percent=0.4 yarn.scheduler.capacity.maximum-applications=1000 yarn.scheduler.capacity.node-locality-delay=40 yarn.scheduler.capacity.queue-mappings-override.enable=false yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator yarn.scheduler.capacity.root.Training.acl_submit_applications=* yarn.scheduler.capacity.root.Training.capacity=20 yarn.scheduler.capacity.root.Training.maximum-am-resource-percent=0.4 yarn.scheduler.capacity.root.Training.maximum-capacity=60 yarn.scheduler.capacity.root.Training.minimum-user-limit-percent=10 yarn.scheduler.capacity.root.Training.ordering-policy=fifo yarn.scheduler.capacity.root.Training.priority=1 yarn.scheduler.capacity.root.Training.state=RUNNING yarn.scheduler.capacity.root.Training.user-limit-factor=1 yarn.scheduler.capacity.root.accessible-node-labels=* yarn.scheduler.capacity.root.acl_administer_queue=* yarn.scheduler.capacity.root.acl_submit_applications=* yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.acl_administer_queue=* yarn.scheduler.capacity.root.default.acl_submit_applications=* yarn.scheduler.capacity.root.default.capacity=20 yarn.scheduler.capacity.root.default.maximum-capacity=20 yarn.scheduler.capacity.root.default.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.default.ordering-policy=fifo yarn.scheduler.capacity.root.default.priority=0 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.user-limit-factor=1 yarn.scheduler.capacity.root.llap.acl_administer_queue=* yarn.scheduler.capacity.root.llap.acl_submit_applications=* yarn.scheduler.capacity.root.llap.capacity=60 yarn.scheduler.capacity.root.llap.maximum-capacity=60 yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.llap.priority=0 yarn.scheduler.capacity.root.llap.state=RUNNING yarn.scheduler.capacity.root.llap.user-limit-factor=1 yarn.scheduler.capacity.root.maximum-capacity=100 yarn.scheduler.capacity.root.ordering-policy=priority-utilization yarn.scheduler.capacity.root.priority=0 yarn.scheduler.capacity.root.queues=Training,default,llap
Created 12-10-2018 03:40 PM
Hi @Mahsa Jan,
Could you please check in resource manager UI the status of "Active Nodes" and "Unhealthy Nodes"
and also check: yarn node -list
Regards,
AQ