Support Questions

Find answers, ask questions, and share your expertise

Map jobs are failing with exit code 143

avatar
New Contributor

Map jobs are failing with exit below error:

 

Timed out after 600 secs Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

 

mapreduce.map.java.opts.max.heap/mapreduce.reduce.java.opts.max.heap is set to 3 Gib
 
 
6 REPLIES 6

avatar
Champion
Can you post the container logs for one of the containers that was killed? In the RM UI drill down through the job until you get the list of Mappers/Reducers that succeeded or failed. Click through to a failed task and then open the logs. You should find an exception in it on the reason.

The code mentioned usually does indicate a heap issue but I have seen it reported for other reason a container was killed, such as when preemption strikes.

avatar
New Contributor

Error that could be observed at first was:

Sat Jan 28 12:47:11 GMT 2017, RpcRetryingCaller{globalStartTime=1485607402108, pause=100, retries=35}, org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region trade_all,devonNewYorkTradeId-0250174430-1,1470766841380.a9301d30e0dbdfadb7a0e08545b08772. is not online on gbrpsr000002816.intranet.barcapint.com,60020,1485545369143

avatar
Champion
Does this MR job access HBase at all?

This error indicates that the Region trade_all was not accessible.

Any errors on the HBase RegionServers? Access the HBase Master UI to see what RS are serving this region and split.

avatar
Expert Contributor

I am facing a similar issue and after looking at the jon history server i see that last mapper has failed. 

The logs from that map task is just this. 

 

2017-10-13 12:15:24,376 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e25_1505390873369_5614_01_000002
2017-10-13 12:15:24,376 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:27 CompletedReds:0 ContAlloc:33 ContRel:4 HostLocal:0 RackLocal:0
2017-10-13 12:15:24,376 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1505390873369_5614_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2017-10-13 12:18:17,290 INFO [IPC Server handler 4 on 55492] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1505390873369_5614_m_000010_0 is : 1.0
2017-10-13 12:18:17,385 INFO [IPC Server handler 3 on 55492] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1505390873369_5614_m_000010_0 is : 1.0
2017-10-13 12:18:17,388 INFO [IPC Server handler 11 on 55492] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1505390873369_5614_m_000010_0
2017-10-13 12:18:17,388 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1505390873369_5614_m_000010_0 TaskAttempt Transitioned from RUNNING to SUCCESS_FINISHING_CONTAINER
2017-10-13 12:18:17,388 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1505390873369_5614_m_000010_0
2017-10-13 12:18:17,389 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1505390873369_5614_m_000010 Task Transitioned from RUNNING to SUCCEEDED
2017-10-13 12:18:17,389 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 28
2017-10-13 12:18:17,667 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:2 AssignedReds:0 CompletedMaps:28 CompletedReds:0 ContAlloc:33 ContRel:4 HostLocal:0 RackLocal:0
2017-10-13 12:19:23,003 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:attempt_1505390873369_5614_m_000010_0 Timed out after 60 secs
2017-10-13 12:19:23,003 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Task attempt attempt_1505390873369_5614_m_000010_0 is done from TaskUmbilicalProtocol's point of view. However, it stays in finishing state for too long
2017-10-13 12:19:23,003 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1505390873369_5614_m_000010_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCESS_CONTAINER_CLEANUP
2017-10-13 12:19:23,003 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_e25_1505390873369_5614_01_000012 taskAttempt attempt_1505390873369_5614_m_000010_0
2017-10-13 12:19:23,003 INFO [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1505390873369_5614_m_000010_0
2017-10-13 12:19:23,010 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1505390873369_5614_m_000010_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2017-10-13 12:19:23,775 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_e25_1505390873369_5614_01_000012
2017-10-13 12:19:23,775 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:28 CompletedReds:0 ContAlloc:33 ContRel:4 HostLocal:0 RackLocal:0
2017-10-13 12:19:23,775 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1505390873369_5614_m_000010_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

 

avatar
New Contributor

yarn logs -applicationId  <application master ID> should help. It occurs typically due improper container memory allocation and physical memory availability on the cluster. 

avatar
Master Guru

@Kshitij Shrivastava,

 

The "Timed out after 600 secs Container killed by the ApplicationMaster" message indicates that the application master did not see any progress in the Task for 10 minutes (default timeout) so the Application Master killed it.

The question is what was the task doing so that no progress was detected.

 

I'd recommend looking at the application logs for clues about what the task was doing when it was killed.

Use the Resource Manager UI or command line like this to get the logs:

yarn logs -applicationId <application ID> <options>

 

Regards,

 

Ben