28806
DISCUSSIONS
102200
MEMBERS
3161
ARTICLES
Created 10-15-2018 12:00 PM
Failed/Killed task in mapreduce is taking too long to start the next attempt:
Mapreduce task first attempt (Killed) Finish time showing on ResourceManager UI - "Fri Oct 12 08:50:17 -0700 2018". It was showing 100% complete
Reason for task killed is - "TaskAttempt killed because it ran on unusable node nodemanger1:8041"
Next successful attempt for the same task started at - "Fri Oct 12 11:39:07 -0700 2018"
As per application master logs the task gets killed at "2018-10-12 11:39:05,003". Below is the snipper of application master logs for specific task:
./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [IPC Server handler 14 on 37982] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1524681227186_277084_m_000140_0 ./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_0 TaskAttempt Transitioned from RUNNING to SUCCESS_FINISHING_CONTAINER ./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1524681227186_277084_m_000140_0 ./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:17,542 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1524681227186_277084_m_000140 Task Transitioned from RUNNING to SUCCEEDED ./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:20,767 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCEEDED ./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:20,767 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1524681227186_277084_m_000140_0: ./container_1524681227186_277084_01_000001/syslog:2018-10-12 08:50:20,767 INFO [ContainerLauncher #218] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_COMPLETED for container container_1524681227186_277084_01_000320 taskAttempt attempt_1524681227186_277084_m_000140_0 ./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,003 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: TaskAttempt killed because it ran on unusable node dm501.atl2.turn.com:8041. AttemptId:attempt_1524681227186_277084_m_000140_0 ./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,003 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_0 TaskAttempt Transitioned from SUCCEEDED to KILLED ./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,011 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1524681227186_277084_m_000140 Task Transitioned from SUCCEEDED to SCHEDULED ./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:05,055 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_1 TaskAttempt Transitioned from NEW to UNASSIGNED ./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,015 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1524681227186_277084_01_098547 to attempt_1524681227186_277084_m_000140_1 ./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,015 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1524681227186_277084_m_000140_1 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED ./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,018 INFO [ContainerLauncher #186] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1524681227186_277084_01_098547 taskAttempt attempt_1524681227186_277084_m_000140_1 ./container_1524681227186_277084_01_000001/syslog:2018-10-12 11:39:07,018 INFO [ContainerLauncher #186] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching attempt_1524681227186_277084_m_000140_1
Could you please suggest what happened between time "08:20" to "11:39" which is causing job getting delayed.