28787
DISCUSSIONS
102065
MEMBERS
3161
ARTICLES
Created on 12-22-2015 03:18 AM - edited 12-22-2015 04:40 AM
Hello community,
I got some issues with some MapReduce jobs I'm running (simple Wordcount or Teragen/Terasort).
The jobs run fine and succeed, but are quite slow. I detected that the Map-Tasks finish after a few seconds, but do not release their containers. Eventually, 60 seconds later, the Application Master kills the containers.
What are potential reasons for that behaviour and how can I resolve it?
My setup is a Single Node Cloudera Cluster 5.5. ResourceManager got 4 vCPU and 8 GB RAM to allocate, the Map-Tasks are using 1 CPU and 1 GB RAM.
Both the NodeManager's and the MapTask's log do not show any conspicuities. No JVM errors, allocated container memory is not exceeded. Here is an extract of the ApplicationMasters Log:
2015-12-22 10:28:51,569 INFO [IPC Server handler 5 on 37397] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1450544559411_0015_m_000017_0 is : 1.0
2015-12-22 10:28:51,572 INFO [IPC Server handler 4 on 37397] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1450544559411_0015_m_000017_0
2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1450544559411_0015_m_000017_0 TaskAttempt Transitioned from RUNNING to SUCCESS_FINISHING_CONTAINER
2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1450544559411_0015_m_000017_0
2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1450544559411_0015_m_000017 Task Transitioned from RUNNING to SUCCEEDED2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 17
2015-12-22 10:28:51,573 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 17
...
2015-12-22 10:30:01,405 INFO [Ping Checker] org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:attempt_1450544559411_0015_m_000017_0 Timed out after 60 secs
2015-12-22 10:30:01,405 WARN [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Task attempt attempt_1450544559411_0015_m_000017_0 is done from TaskUmbilicalProtocol's point of view. However, it stays in finishing state for too long
2015-12-22 10:30:01,405 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1450544559411_0015_m_000017_0 TaskAttempt Transitioned from SUCCESS_FINISHING_CONTAINER to SUCCESS_CONTAINER_CLEANUP
2015-12-22 10:30:01,406 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1450544559411_0015_01_000019 taskAttempt attempt_1450544559411_0015_m_000017_0
2015-12-22 10:30:01,407 INFO [ContainerLauncher #8] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1450544559411_0015_m_000017_0
2015-12-22 10:30:01,409 INFO [ContainerLauncher #8] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : npshadoop02.cc.de:8041
2015-12-22 10:30:01,425 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1450544559411_0015_m_000017_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2015-12-22 10:30:02,488 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1450544559411_0015_01_000017
2015-12-22 10:30:02,489 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1450544559411_0015_m_000017_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
I think especially this line is striking:
Task attempt [...] is done from TaskUmbilicalProtocol's point of view. However, it stays in finishing state for too long
Best regards,
Benjamin