Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Tez job hang, waiting for AM container to be allocated.

avatar

11337-3.png

Hi Team,

Job hang while loading data from HDFS to relation in PIG grunt shell using Tez( tried in Mapreduce too) and shown following message in web UI.

ACCEPTED: waiting for AM container to be allocated, launched and register with RM

Please suggest.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@sagar pavan This will happen when there are not enough resources (memory) to run the AppMaster container needed to control the Tez job. In YARN capacity-scheduler.xml there is a property

yarn.scheduler.capacity.maximum-am-resource-percent

which controls the percentage of total cluster memory that can be used by AM containers. If you have several jobs running then each AM will consume the memory required for one container. If this exceeds the given % of total cluster memory the next AM to run will wait until there are free resources for it to run. You'll need to increase yarn.scheduler.capacity.maximum-am-resource-percent to get the AM to run.

View solution in original post

6 REPLIES 6

avatar
Expert Contributor

@sagar pavan the Diagnostic message indicates the Users AM resource limit is exceeded. Please review the capacity scheduler's AM resource limit and raise it from the default 20%, this should allow the AM container to be launched

avatar
Expert Contributor

@sagar pavan This will happen when there are not enough resources (memory) to run the AppMaster container needed to control the Tez job. In YARN capacity-scheduler.xml there is a property

yarn.scheduler.capacity.maximum-am-resource-percent

which controls the percentage of total cluster memory that can be used by AM containers. If you have several jobs running then each AM will consume the memory required for one container. If this exceeds the given % of total cluster memory the next AM to run will wait until there are free resources for it to run. You'll need to increase yarn.scheduler.capacity.maximum-am-resource-percent to get the AM to run.

avatar

@Terry Stebbens and @Ian Roberts currently it has, yarn.scheduler.capacity.maximum-am-resource-percent=0.2 To what % I can increase the value.

avatar

Thanks @Terry Stebbens and @Ian Roberts, I have changed to yarn.scheduler.capacity.maximum-am-resource-percent=0.4 and its working fine.

avatar
Super Collaborator

AM percent property in YARN is relevant if the cluster has idle resources but still and AM is not being being started for the application. On the YARN UI you will see available capacity but AM not being started. E.g. cluster has 100GB capacity and is using 50GB only. If you want to run X apps concurrently and each AM need M GB resources (per config) then you need X*M capacity for AMs and this can be used to determine the AM percent as a function of the total cluster capacity.

On the other hand, if the cluster does not have any capacity at that time (as seen in YARN UI) then changing the AM percent may not help. The cluster does not have capacity to obtain a container slot for the AM. E.g. cluster has 100GB capacity and is already using 100GB. In this case you will have to wait for capacity to free up.

avatar
Super Collaborator

AM percent property in YARN is relevant if the cluster has idle resources but still and AM is not being being started for the application. On the YARN UI you will see available capacity but AM not being started. E.g. cluster has 100GB capacity and is using 50GB only. If you want to run X apps concurrently and each AM need M GB resources (per config) then you need X*M capacity for AMs and this can be used to determine the AM percent as a function of the total cluster capacity.

On the other hand, if the cluster does not have any capacity at that time (as seen in YARN UI) then changing the AM percent may not help. The cluster does not have capacity to obtain a container slot for the AM. E.g. cluster has 100GB capacity and is already using 100GB. In this case you will have to wait for capacity to free up.