Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is ShuffleHandler part of container?

Is ShuffleHandler part of container?

I asked similar question before and I thought I would need to increase the container size to fix the following error:

2015-10-21 07:25:12,720 ERROR mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(1053)) - Shuffle error: 
java.lang.OutOfMemoryError: GC overhead limit exceeded 
at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) 
at java.lang.StringCoding.encode(StringCoding.java:344) 
at java.lang.String.getBytes(String.java:916) 
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) 
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242) 
-- 
at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) 
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) 
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) 
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) 
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)

But now I'm not sure if netty process is part of container or not.

Should I increase NodeManger Java heap size?

9 REPLIES 9

Re: Is ShuffleHandler part of container?

@Hajime

Looking at source code of shuffle handler. I would say

Let's increase HADOOP_HEAPSIZE

1053 LOG.error("Shuffle error: ", cause);
1054      if (ch.isConnected()) {
1055        LOG.error("Shuffle error " + e);
1056        sendError(ctx, INTERNAL_SERVER_ERROR);
1057      }

Re: Is ShuffleHandler part of container?

You mean I should increase the heap of NodeManager process, correct?

Re: Is ShuffleHandler part of container?

@Hajime Please.. let's start with that. Do you have smartsense installed?

Re: Is ShuffleHandler part of container?

New Contributor

ShuffleHandler is part of NM auxiliary service so it will use NM daemon's memory rather than container's. So increase container's heap size may not be helpful for your case. You should increase NM's daemon size instead as suggested by @Neeraj Sabharwal. @Hajime

Re: Is ShuffleHandler part of container?

Thank you!

Re: Is ShuffleHandler part of container?

@Hajime

It appears you need to increase the Reducer Container heap size.

Source : https://hadoop.apache.org/docs/stable/hadoop-mapre...

Snippet : (See section Implementing a Custom Shuffle and a Custom Sort)

... and a org.apache.hadoop.mapred.ShuffleConsumerPlugin implementation class running in the Reducer tasks.
The default implementations provided by Hadoop can be used as references:
org.apache.hadoop.mapred.ShuffleHandler
org.apache.hadoop.mapreduce.task.reduce.Shuffle
...
...

Re: Is ShuffleHandler part of container?

I originally thought similar thing but now I think ShuffleHandler is just showing error from netty and ... I'm thinking netty is in NodeManager, no?

Highlighted

Re: Is ShuffleHandler part of container?

@Hajime Sorry for the delay in responding. There is definitely dependency on the NodeManager but the Shuffle itself takes place in Reducer. If you look at the reference I provided earlier, it does mention dependency on NodeManager also.

Are you able to share any specific details of the cluster config like size of the cluster, heap sizes etc.

If you could, please update this thread with your findings.

Re: Is ShuffleHandler part of container?

Mentor

@Hajime are you still having issues? Can you accept the best answer or provide your own solution?