Reply
New Contributor
Posts: 10
Registered: ‎12-14-2015

Map tasks succeeding but not releasing resources

[ Edited ]

Hello community,

 

I got some issues with some MapReduce jobs I'm running (simple Wordcount or Teragen/Terasort).

The jobs run fine and succeed, but are quite slow. I detected that the Map-Tasks finish after a few seconds, but do not release their containers. Eventually, 60 seconds later, the Application Master kills the containers.

 

What are potential reasons for that behaviour and how can I resolve it?

 

My setup is a Single Node Cloudera Cluster 5.5. ResourceManager got 4 vCPU and 8 GB RAM to allocate, the Map-Tasks are using 1 CPU and 1 GB RAM.

 

Both the NodeManager's and the MapTask's log do not show any conspicuities. No JVM errors, allocated container memory is not exceeded. Here is an extract of the ApplicationMasters Log:

 

 

2015-12-22 10:28:51,569 INFO [IPC Server handler 5 on 37397] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1450544559411_0015_m_000017_0 is : 1.0
2015-12-22 10:28:51,572 INFO [IPC Server handler 4 on 37397] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1450544559411_0015_m_000017_0
2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1450544559411_0015_m_000017_0 TaskAttempt Transitioned from RUNNING to SUCCESS_FINISHING_CONTAINER
2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1450544559411_0015_m_000017_0
2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1450544559411_0015_m_000017 Task Transitioned from RUNNING to SUCCEEDED2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 17
2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 17
...
2015-12-22 10:30:01,405 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:attempt_1450544559411_0015_m_000017_0 Timed out after 60 secs
2015-12-22 10:30:01,405 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Task attempt attempt_1450544559411_0015_m_000017_0 is done from TaskUmbilicalProtocol's point of view. However, it stays in finishing state for too long
2015-12-22 10:30:01,405 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1450544559411_0015_m_000017_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCESS_CONTAINER_CLEANUP
2015-12-22 10:30:01,406 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1450544559411_0015_01_000019 taskAttempt attempt_1450544559411_0015_m_000017_0
2015-12-22 10:30:01,407 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1450544559411_0015_m_000017_0
2015-12-22 10:30:01,409 INFO [ContainerLauncher #8] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : npshadoop02.cc.de:8041
2015-12-22 10:30:01,425 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1450544559411_0015_m_000017_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2015-12-22 10:30:02,488 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1450544559411_0015_01_000017

2015-12-22 10:30:02,489 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1450544559411_0015_m_000017_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

 

I think especially this line is striking:

 

Task attempt [...] is done from TaskUmbilicalProtocol's point of view. However, it stays in finishing state for too long

 

Best regards,

Benjamin

New Contributor
Posts: 1
Registered: ‎05-14-2017

Re: Map tasks succeeding but not releasing resources

HI Benjamin,

i have the same issue :

Task attempt [...] is done from TaskUmbilicalProtocol's point of view. However, it stays in finishing state for too long

did you solve it?

jhewei

Announcements