Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Mapreduce job hang, waiting for AM container to be allocated.

Contributor

Hi Team,

Job hang while importing tables via sqoop shown following message in web UI.

ACCEPTED: waiting for AM container to be allocated, launched and register with RM

Kindly suggest.

3413-sqoop-mapreduce.jpg

1 ACCEPTED SOLUTION

Super Guru
@Nilesh

There are 2 unhealthy nodes, if you click on "2" under unhealthy nodes section, you will get reason why they are unhealthy, it could be because of bad disk etc. Please check and try to check nodemanager's logs, you will get more info in there.

View solution in original post

12 REPLIES 12

Contributor

Hi, Nilesh. Have you checked the yarn scheduler? Is the default queue out of resource?

@Nilesh

I will suggest yo revisit your yarn memory configuration once. Seems you might be running out of memory.

Can you please let me know what values you have set for yarn.scheduler.capacity.maximum-am-resource-percent ?

If possible please attach -

1. Yarn RM UI snap

2. yarn-site.xml

3. mapred-site.xml

4. Scheduler snap from RM UI -

http://<RM-Hostname>:8088/cluster/scheduler

Contributor

@Sagar Shimpi

yarn.scheduler.capacity.maximum-am-resource-percent=0.2

Kindly find attached file for reference.

3415-scheduler-snap.jpg

3418-yarn-memory.jpg

yarn-site.xml

mapred-site.xml

@Nilesh From the resource manager UI i see there are no "Active Nodes" running and hence the "Total Memory" in UI is displaying 0.

Can you check if your node managers are UP and communicating to RM.

Super Guru
@Nilesh

There are 2 unhealthy nodes, if you click on "2" under unhealthy nodes section, you will get reason why they are unhealthy, it could be because of bad disk etc. Please check and try to check nodemanager's logs, you will get more info in there.

Contributor

@Kuldeep Kulkarni and @Sagar Shimpi

Issue has been resolved by changing below parameter in yarn-site.xml

yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage

Previously it was 90% I changed it to 99 Now Job is in running state.

Could you please shed some light on this parameter.

Super Guru

@Nilesh -

yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage

The maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs.

Contributor

@Kuldeep Kulkarni

How to find which disk is marked as bad?

Super Guru

Generally it should be visible on RM UI once you click on the unhealthy nodes.

OR

You can go the unhealthy node(http://<unhealthy-node-manager>:8042/jmx) and check JMX

3443-screen-shot-2016-04-14-at-125436-pm.png

Hi @Nilesh,

I am too facing same issue, could you please suggest me if you have solution.

Explorer

Hi All

I am also facing the same issue. but My server hard disk is new. There is no warning/alert in my Ambari level.

Once my current job completes, then only the second job is allowed to execute..

If my first job is running for 60 minutes, my second job is on hold. Any suggestions.

server capacity : 16 core , 64 GB RAM

Thx

Muthu

New Contributor

Hi, this Looks like fifo-scheduling / capacity scheduling with 1 group only Try to switch to fair scheduling in yarn.

Regards,

Volker

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.