Reply
Highlighted
Explorer
Posts: 7
Registered: ‎10-21-2017

Failed/Killed task in mapreduce is taking too long to start the next attempt

Failed/Killed task in mapreduce is taking too long to start the next attempt:

 

Mapreduce task first attempt (Killed) Finish time showing on ResourceManager UI - "Fri Oct 12 08:50:17 -0700 2018". It was showing 100% complete

Reason for task killed is - "TaskAttempt killed because it ran on unusable node nodemanger1:8041"

 

Next successful attempt for the same task started at - "Fri Oct 12 11:39:07 -0700 2018"

 

As per application master logs the task gets killed at "2018-10-12 11:39:05,003". Below is the snipper of application master logs for specific task:

 

 

./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [IPC Server handler 14 on 37982] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1524681227186_277084_m_000140_0
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_0 TaskAttempt Transitioned from RUNNING to SUCCESS_FINISHING_CONTAINER
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1524681227186_277084_m_000140_0
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1524681227186_277084_m_000140 Task Transitioned from RUNNING to SUCCEEDED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:20,767 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:20,767 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1524681227186_277084_m_000140_0:
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:20,767 INFO [ContainerLauncher #218] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_COMPLETED for container container_1524681227186_277084_01_000320 taskAttempt attempt_1524681227186_277084_m_000140_0
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,003 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node dm501.atl2.turn.com:8041. AttemptId:attempt_1524681227186_277084_m_000140_0
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,003 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_0 TaskAttempt Transitioned from SUCCEEDED to KILLED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,011 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1524681227186_277084_m_000140 Task Transitioned from SUCCEEDED to SCHEDULED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,055 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_1 TaskAttempt Transitioned from NEW to UNASSIGNED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,015 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1524681227186_277084_01_098547 to attempt_1524681227186_277084_m_000140_1
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,015 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_1 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,018 INFO [ContainerLauncher #186] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1524681227186_277084_01_098547 taskAttempt attempt_1524681227186_277084_m_000140_1
./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,018 INFO [ContainerLauncher #186] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1524681227186_277084_m_000140_1

 

 

 Could you please suggest what happened between time "08:20" to "11:39" which is causing job getting delayed.

 

Announcements