Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

data nodes evicted randomly and cluster marks node for decomm

Solved Go to solution

data nodes evicted randomly and cluster marks node for decomm

Explorer

Hello all,

I am working on a gig where data nodes evicted randomly and cluster marks node for decom.  The data nodes processes have to be killed and restarted.

 

This is a random event and difficult to replicate,  I am attaching error log from hadoop-hdfs data node.

 

2014-11-19 07:35:13,847 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: mdata07:50010:DataXceiver error processing WRITE_BLOCK operation  src: /10.10.10.103:46686 dest: /10.10.10.107:50010
java.lang.OutOfMemoryError: Java heap space
 at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:120)
 at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:68)
 at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
 at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:409)
 at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:325)
 at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:623)
 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
 at java.lang.Thread.run(Thread.java:744)
2014-11-19 07:36:16,565 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode{data=FSDataset{dirpath='[/data-mount/hadoop/dfs/dn/current]'}, localName='mdata07:50010', datanodeUuid='7181ecc9-ab8e-491a-b37b-b5be724701af', xmitsInProgress=0}:Exception transfering block BP-2015128538-10.10.10.10-1403613223603:blk_1088831462_15096140 to mirror 10.10.10.101:50010: java.io.EOFException: Premature EOF: no length prefix available
2014-11-19 07:36:15,690 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.10.10.10.107, datanodeUuid=7181ecc9-ab8e-491a-b37b-b5be724701af, infoPort=50075, ipcPort=50020, storageInfo=lv=-55;cid=CID-5ee917ca-3875-4db6-bfef-2ebdc160b420;nsid=185559117;c=0):Exception writing BP-2015128538-10.10.10.10.10-1403613223603:blk_1088831431_15096109 to mirror 10.10.10.10.105:50010
java.io.IOException: Broken pipe
 at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
 at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
 at sun.nio.ch.IOUtil.write(IOUtil.java:65)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
 at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
 at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
 at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
 at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
 at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
 at java.io.DataOutputStream.write(DataOutputStream.java:107)
 at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.mirrorPacketTo(PacketReceiver.java:200)
 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:494)
 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
 at java.lang.Thread.run(Thread.java:744)

1 ACCEPTED SOLUTION

Accepted Solutions

Re: data nodes evicted randomly and cluster marks node for decomm

Explorer

Also, noticed this error:

HAS_DOWNSTREAM_IN_PIPELINE
java.io.EOFException: Premature EOF: no length prefix available
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1988)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1083)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:39:06,712 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2015128538-10.10.10.10-1403613223603:blk_1088878247_15143005 src: /10.10.10.100:52326 dest: /10.10.10.100:50010
2014-11-19 11:39:44,941 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2015128538-10.10.10.10-1403613223603:blk_1088878255_15143013 src: /10.10.10.104:57300 dest: /10.10.10.100:50010
2014-11-19 11:39:43,972 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-2015128538-10.10.10.10-1403613223603:blk_1088878236_15142994
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:39:47,664 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run():
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1306)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1246)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1167)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:40:44,444 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2015128538-10.10.10.10-1403613223603:blk_1088878236_15142994, type=HAS_DOWNSTREAM_IN_PIPELINE
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1306)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1246)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1167)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:40:59,863 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2015128538-10.10.10.10-1403613223603:blk_1088878236_15142994, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2014-11-19 11:39:39,534 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2015128538-10.10.10.10-1403613223603:blk_1088878251_15143009 src: /10.10.10.100:52327 dest: /10.10.10.100:50010
2014-11-19 11:39:27,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-2015128538-10.10.10.10-1403613223603:blk_1088878111_15142869
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
        at java.lang.Thread.run(Thread.java:744)

2 REPLIES 2

Re: data nodes evicted randomly and cluster marks node for decomm

Explorer

Also, noticed this error:

HAS_DOWNSTREAM_IN_PIPELINE
java.io.EOFException: Premature EOF: no length prefix available
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1988)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1083)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:39:06,712 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2015128538-10.10.10.10-1403613223603:blk_1088878247_15143005 src: /10.10.10.100:52326 dest: /10.10.10.100:50010
2014-11-19 11:39:44,941 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2015128538-10.10.10.10-1403613223603:blk_1088878255_15143013 src: /10.10.10.104:57300 dest: /10.10.10.100:50010
2014-11-19 11:39:43,972 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-2015128538-10.10.10.10-1403613223603:blk_1088878236_15142994
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:39:47,664 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run():
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1306)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1246)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1167)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:40:44,444 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2015128538-10.10.10.10-1403613223603:blk_1088878236_15142994, type=HAS_DOWNSTREAM_IN_PIPELINE
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1306)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1246)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1167)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:40:59,863 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2015128538-10.10.10.10-1403613223603:blk_1088878236_15142994, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2014-11-19 11:39:39,534 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2015128538-10.10.10.10-1403613223603:blk_1088878251_15143009 src: /10.10.10.100:52327 dest: /10.10.10.100:50010
2014-11-19 11:39:27,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-2015128538-10.10.10.10-1403613223603:blk_1088878111_15142869
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
        at java.lang.Thread.run(Thread.java:744)

Re: data nodes evicted randomly and cluster marks node for decomm

Contributor

I see below error in log:

java.lang.OutOfMemoryError: Java heap space

 

So i would like to know the heap memory you have allocated right now? 

Can you try increasing heap size of datanode.