Support Questions

Find answers, ask questions, and share your expertise

MR jobs failed on IO and java heap size

avatar
Master Collaborator
Hi,
 
I have a MR job running with 30 reducers, one of the reducers when it reached specific percentage failed with the below error:
 
I increased the reducer memory but with no success, i invetigating the data to find if specific key value has a lot of data and caused this.
 
Mu point i don't get from the below error if it IO issue which might  limited  connections between  nodes then i need to increase the ulimit or it's a memory issue and i increase to increase the memory more
 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/liveperson/hadoop/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/liveperson/data/server_hdfs/data/disk6/yarn/nm/usercache/lereports/appcache/application_1486847749225_242069/filecache/10/job.jar/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Halting due to Out Of Memory Error...
 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "DataStreamer for file /liveperson/data/server_live-engage-mr/output/1490104329046-bi_contribution_xsess/_temporary/1/_temporary/attempt_1486847749225_242069_r_000001_0/RAWDATA-b_default-RPT_FA_CONTRIBUTION_XSESSION-r-00001 block BP-1370881566-172.16.144.147-1434971434689:blk_1283227391_209512600"
Mar 21, 2017 10:34:10 AM com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector
WARNING: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
at com.datastax.shaded.netty.buffer.HeapChannelBuffer.<init>(HeapChannelBuffer.java:42)
at com.datastax.shaded.netty.buffer.BigEndianHeapChannelBuffer.<init>(BigEndianHeapChannelBuffer.java:34)
at com.datastax.shaded.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at com.datastax.shaded.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
at com.datastax.shaded.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
at com.datastax.shaded.netty.channel.socket.nio.NioWorker.read(NioWorker.java:80)
at com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at com.datastax.shaded.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at com.datastax.shaded.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at com.datastax.shaded.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
 
Mar 21, 2017 11:02:11 AM com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector
WARNING: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
 
Mar 21, 2017 11:08:50 AM com.datastax.shaded.netty.util.HashedWheelTimer
WARNING: An exception was thrown by TimerTask.
java.lang.OutOfMemoryError: Java heap space
 
Halting due to Out Of Memory Error...
Mar 21, 2017 11:11:10 AM com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector
WARNING: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
 
Mar 21, 2017 11:27:11 AM com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector
WARNING: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
 
 
Log Type: stdout
Log Upload Time: Tue Mar 21 13:49:54 -0400 2017
Log Length: 0
 
Log Type: syslog
Log Upload Time: Tue Mar 21 13:49:54 -0400 2017
Log Length: 3997692
Showing 4096 bytes of 3997692 total. Click here for the full log.
enewer.renew(LeaseRenewer.java:423)
at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.protobuf.ServiceException: java.lang.RuntimeException: unexpected checked exception
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:244)
at com.sun.proxy.$Proxy13.renewLease(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571)
... 12 more
Caused by: java.lang.RuntimeException: unexpected checked exception
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1059)
at org.apache.hadoop.ipc.Client.call(Client.java:1445)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
... 14 more
Caused by: java.lang.OutOfMemoryError: Java heap space
2017-03-21 11:01:36,323 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "svpr-dhc024.lpdomain.com/172.16.144.172"; destination host is: "svpr-dhc012.lpdomain.com":45935; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1470)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:243)
at com.sun.proxy.$Proxy8.ping(Unknown Source)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:782)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Couldn't set up IO streams
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:790)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519)
at org.apache.hadoop.ipc.Client.call(Client.java:1442)
... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
 
2017-03-21 11:06:36,820 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "svpr-dhc024.lpdomain.com/172.16.144.172"; destination host is: "svpr-dhc012.lpdomain.com":45935; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1470)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:243)
at com.sun.proxy.$Proxy8.ping(Unknown Source)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:782)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Couldn't set up IO streams
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:790)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519)
at org.apache.hadoop.ipc.Client.call(Client.java:1442)
... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
 
2017-03-21 11:11:10,785 INFO [LeaseRenewer:lereports@VAProd] org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking renewLease of class ClientNamenodeProtocolTranslatorPB over svpr-mhc102.lpdomain.com/172.16.144.148:8020 after 1 fail over attempts. Trying to fail over after sleeping for 593ms.
2017-03-21 11:16:47,403 ERROR [LeaseRenewer:lereports@VAProd] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[LeaseRenewer:lereports@VAProd,5,main] threw an Error.  Shutting down now...
java.lang.OutOfMemoryError: Java heap space
2017-03-21 11:34:56,644 INFO [LeaseRenewer:lereports@VAProd] org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException
3 REPLIES 3

avatar
Expert Contributor

Hi @Fawze ,

 

Did you manage to solve this? We're also hitting the same issue.

 

Thanks,

Megh

avatar
Master Collaborator

Hi @vidanimegh if you are talking abou mapreduce job, you need to check if the job is failing in the map or reduce phase.

 

if the error is a java heap size, you need to increase me java heap size for teh mapper or reducer using  mapreduce.map.java.opts  or  mapreduce.reduce.java.opts 

avatar
Expert Contributor

Hi @Fawze ,

 

This is happening for ANALYZE TABLE commands, which I think are map only.

 

I've tried the heap space options you have mentioned and they're not helping.

 

Thanks,

Megh