Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Failed/Killed task in mapreduce is taking too long to start the next attempt

Failed/Killed task in mapreduce is taking too long to start the next attempt

Explorer

Failed/Killed task in mapreduce is taking too long to start the next attempt:

 

Mapreduce task first attempt (Killed) Finish time showing on ResourceManager UI - "Fri Oct 12 08:50:17 -0700 2018". It was showing 100% complete

Reason for task killed is - "TaskAttempt killed because it ran on unusable node nodemanger1:8041"

 

Next successful attempt for the same task started at - "Fri Oct 12 11:39:07 -0700 2018"

 

As per application master logs the task gets killed at "2018-10-12 11:39:05,003". Below is the snipper of application master logs for specific task:

 

 

./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [IPC Server handler 14 on 37982] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1524681227186_277084_m_000140_0
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_0 TaskAttempt Transitioned from RUNNING to SUCCESS_FINISHING_CONTAINER
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1524681227186_277084_m_000140_0
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1524681227186_277084_m_000140 Task Transitioned from RUNNING to SUCCEEDED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:20,767 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:20,767 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1524681227186_277084_m_000140_0:
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:20,767 INFO [ContainerLauncher #218] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_COMPLETED for container container_1524681227186_277084_01_000320 taskAttempt attempt_1524681227186_277084_m_000140_0
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,003 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node dm501.atl2.turn.com:8041. AttemptId:attempt_1524681227186_277084_m_000140_0
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,003 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_0 TaskAttempt Transitioned from SUCCEEDED to KILLED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,011 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1524681227186_277084_m_000140 Task Transitioned from SUCCEEDED to SCHEDULED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,055 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_1 TaskAttempt Transitioned from NEW to UNASSIGNED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,015 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1524681227186_277084_01_098547 to attempt_1524681227186_277084_m_000140_1
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,015 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_1 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,018 INFO [ContainerLauncher #186] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1524681227186_277084_01_098547 taskAttempt attempt_1524681227186_277084_m_000140_1
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,018 INFO [ContainerLauncher #186] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1524681227186_277084_m_000140_1

 

 

 Could you please suggest what happened between time "08:20" to "11:39" which is causing job getting delayed.