Support Questions
Find answers, ask questions, and share your expertise

Tez job hang, waiting for AM container to be allocated.

11337-3.png

Hi Team,

Job hang while loading data from HDFS to relation in PIG grunt shell using Tez( tried in Mapreduce too) and shown following message in web UI.

ACCEPTED: waiting for AM container to be allocated, launched and register with RM

Please suggest.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Tez job hang, waiting for AM container to be allocated.

Contributor

@sagar pavan This will happen when there are not enough resources (memory) to run the AppMaster container needed to control the Tez job. In YARN capacity-scheduler.xml there is a property

yarn.scheduler.capacity.maximum-am-resource-percent

which controls the percentage of total cluster memory that can be used by AM containers. If you have several jobs running then each AM will consume the memory required for one container. If this exceeds the given % of total cluster memory the next AM to run will wait until there are free resources for it to run. You'll need to increase yarn.scheduler.capacity.maximum-am-resource-percent to get the AM to run.

View solution in original post

6 REPLIES 6

Re: Tez job hang, waiting for AM container to be allocated.

Rising Star

@sagar pavan the Diagnostic message indicates the Users AM resource limit is exceeded. Please review the capacity scheduler's AM resource limit and raise it from the default 20%, this should allow the AM container to be launched

Re: Tez job hang, waiting for AM container to be allocated.

Contributor

@sagar pavan This will happen when there are not enough resources (memory) to run the AppMaster container needed to control the Tez job. In YARN capacity-scheduler.xml there is a property

yarn.scheduler.capacity.maximum-am-resource-percent

which controls the percentage of total cluster memory that can be used by AM containers. If you have several jobs running then each AM will consume the memory required for one container. If this exceeds the given % of total cluster memory the next AM to run will wait until there are free resources for it to run. You'll need to increase yarn.scheduler.capacity.maximum-am-resource-percent to get the AM to run.

View solution in original post

Re: Tez job hang, waiting for AM container to be allocated.

@Terry Stebbens and @Ian Roberts currently it has, yarn.scheduler.capacity.maximum-am-resource-percent=0.2 To what % I can increase the value.

Re: Tez job hang, waiting for AM container to be allocated.

Thanks @Terry Stebbens and @Ian Roberts, I have changed to yarn.scheduler.capacity.maximum-am-resource-percent=0.4 and its working fine.

Re: Tez job hang, waiting for AM container to be allocated.

Expert Contributor

AM percent property in YARN is relevant if the cluster has idle resources but still and AM is not being being started for the application. On the YARN UI you will see available capacity but AM not being started. E.g. cluster has 100GB capacity and is using 50GB only. If you want to run X apps concurrently and each AM need M GB resources (per config) then you need X*M capacity for AMs and this can be used to determine the AM percent as a function of the total cluster capacity.

On the other hand, if the cluster does not have any capacity at that time (as seen in YARN UI) then changing the AM percent may not help. The cluster does not have capacity to obtain a container slot for the AM. E.g. cluster has 100GB capacity and is already using 100GB. In this case you will have to wait for capacity to free up.

Re: Tez job hang, waiting for AM container to be allocated.

Expert Contributor

AM percent property in YARN is relevant if the cluster has idle resources but still and AM is not being being started for the application. On the YARN UI you will see available capacity but AM not being started. E.g. cluster has 100GB capacity and is using 50GB only. If you want to run X apps concurrently and each AM need M GB resources (per config) then you need X*M capacity for AMs and this can be used to determine the AM percent as a function of the total cluster capacity.

On the other hand, if the cluster does not have any capacity at that time (as seen in YARN UI) then changing the AM percent may not help. The cluster does not have capacity to obtain a container slot for the AM. E.g. cluster has 100GB capacity and is already using 100GB. In this case you will have to wait for capacity to free up.