Reply
Highlighted
Explorer
Posts: 9
Registered: ‎11-19-2014
Accepted Solution

data nodes evicted randomly and cluster marks node for decomm

Hello all,

I am working on a gig where data nodes evicted randomly and cluster marks node for decom.  The data nodes processes have to be killed and restarted.

 

This is a random event and difficult to replicate,  I am attaching error log from hadoop-hdfs data node.

 

2014-11-19 07:35:13,847 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: mdata07:50010:DataXceiver error processing WRITE_BLOCK operation  src: /10.10.10.103:46686 dest: /10.10.10.107:50010
java.lang.OutOfMemoryError: Java heap space
 at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:120)
 at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:68)
 at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
 at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:409)
 at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:325)
 at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:623)
 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
 at java.lang.Thread.run(Thread.java:744)
2014-11-19 07:36:16,565 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode{data=FSDataset{dirpath='[/data-mount/hadoop/dfs/dn/current]'}, localName='mdata07:50010', datanodeUuid='7181ecc9-ab8e-491a-b37b-b5be724701af', xmitsInProgress=0}:Exception transfering block BP-2015128538-10.10.10.10-1403613223603:blk_1088831462_15096140 to mirror 10.10.10.101:50010: java.io.EOFException: Premature EOF: no length prefix available
2014-11-19 07:36:15,690 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.10.10.10.107, datanodeUuid=7181ecc9-ab8e-491a-b37b-b5be724701af, infoPort=50075, ipcPort=50020, storageInfo=lv=-55;cid=CID-5ee917ca-3875-4db6-bfef-2ebdc160b420;nsid=185559117;c=0):Exception writing BP-2015128538-10.10.10.10.10-1403613223603:blk_1088831431_15096109 to mirror 10.10.10.10.105:50010
java.io.IOException: Broken pipe
 at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
 at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
 at sun.nio.ch.IOUtil.write(IOUtil.java:65)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
 at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
 at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
 at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
 at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
 at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
 at java.io.DataOutputStream.write(DataOutputStream.java:107)
 at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.mirrorPacketTo(PacketReceiver.java:200)
 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:494)
 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
 at java.lang.Thread.run(Thread.java:744)

Explorer
Posts: 9
Registered: ‎11-19-2014

Re: data nodes evicted randomly and cluster marks node for decomm

Also, noticed this error:

HAS_DOWNSTREAM_IN_PIPELINE
java.io.EOFException: Premature EOF: no length prefix available
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1988)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1083)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:39:06,712 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2015128538-10.10.10.10-1403613223603:blk_1088878247_15143005 src: /10.10.10.100:52326 dest: /10.10.10.100:50010
2014-11-19 11:39:44,941 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2015128538-10.10.10.10-1403613223603:blk_1088878255_15143013 src: /10.10.10.104:57300 dest: /10.10.10.100:50010
2014-11-19 11:39:43,972 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-2015128538-10.10.10.10-1403613223603:blk_1088878236_15142994
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:39:47,664 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run():
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1306)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1246)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1167)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:40:44,444 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2015128538-10.10.10.10-1403613223603:blk_1088878236_15142994, type=HAS_DOWNSTREAM_IN_PIPELINE
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1306)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1246)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1167)
        at java.lang.Thread.run(Thread.java:744)
2014-11-19 11:40:59,863 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-2015128538-10.10.10.10-1403613223603:blk_1088878236_15142994, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2014-11-19 11:39:39,534 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2015128538-10.10.10.10-1403613223603:blk_1088878251_15143009 src: /10.10.10.100:52327 dest: /10.10.10.100:50010
2014-11-19 11:39:27,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-2015128538-10.10.10.10-1403613223603:blk_1088878111_15142869
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:711)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229)
        at java.lang.Thread.run(Thread.java:744)

Explorer
Posts: 46
Registered: ‎03-25-2017

Re: data nodes evicted randomly and cluster marks node for decomm

I see below error in log:

java.lang.OutOfMemoryError: Java heap space

 

So i would like to know the heap memory you have allocated right now? 

Can you try increasing heap size of datanode.

Announcements