Created 10-30-2017 09:46 AM
I am not able to load files into hdfs. I get following error. The file size if 300MB. Also splitting files into smaller one works with sporadic error.
All datanodes DatanodeInfoWithStorage[10.11.12.11:50010,DS-835fbe86-c1f5-4967-80a4-1e84e7854425,DISK] are bad.
d-98a8-4c9c-9bcc-de6ada2d290c,DISK]: bad datanode DatanodeInfoWithStorage[10.11.12.14:50010,DS-e6fd4e6d-98a8-4c9c-9bcc-de6ada2d290c,DISK]<br>17/10/30 15:17:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream<br>java.io.EOFException: Premature EOF: no length prefix available<br> at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2464)<br> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1461)<br> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1302)<br> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:999)<br> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:506)<br>17/10/30 15:17:45 WARN hdfs.DFSClient: Error Recovery for block BP-2139487625-10.70.12.115-1447775100056:blk_1073910746_170708 in pipeline DatanodeInfoWithStorage[10.11.12.11:50010,DS-43a97303-ce81-4953-9adb-560131c8a440,DISK], DatanodeInfoWithStorage[10.11.12.12:50010,DS-78bef79e-6a05-4118-9ccd-fe10a88df453,DISK]: bad datanode DatanodeInfoWithStorage[:50010,DS-43a97303-ce81-4953-9adb-560131c8a440,DISK]<br>17/10/30 15:18:50 INFO hdfs.DFSClient: Exception in createBlockOutputStream<br>java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.70.12.119:54084 remote=/10.70.12.118:50010]<br> at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)<br> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)<br> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)<br> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)<br> at java.io.FilterInputStream.read(FilterInputStream.java:83)<br> at java.io.FilterInputStream.read(FilterInputStream.java:83)<br> at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2462)<br> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1461)<br> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1302)<br> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:999)<br> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:506)<br>put: All datanodes DatanodeInfoWithStorage[10.11.12.11:50010,DS-78bef79e-6a05-4118-9ccd-fe10a88df453,DISK] are bad. Aborting...
Created 10-30-2017 09:53 AM
Can you please check and share the Complete DataNode log?
It should show the actual cause of the issue. For example if you see "too many open files" error or something in your DataNode logs?
Created 10-30-2017 01:42 PM
The log is : logs.txt
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1590)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1525)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1438)
at java.lang.Thread.run(Thread.java:745)
2017-10-30 19:04:52,235 INFO datanode.DataNode (BlockReceiver.java:run(1449)) - PacketResponder: BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[10.11.12.11:50010, 10.11.12.12:50010]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1590)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1525)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1438)
at java.lang.Thread.run(Thread.java:745)
2017-10-30 19:04:52,236 INFO datanode.DataNode (BlockReceiver.java:run(1463)) - PacketResponder: BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[10.11.12.11:50010, 10.11.12.12:50010] terminating
2017-10-30 19:04:52,242 INFO datanode.DataNode (DataXceiver.java:writeBlock(669)) - Receiving BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751 src: /10.11.12.15:48177 dest: /10.11.12.15:50010
2017-10-30 19:04:52,243 INFO impl.FsDatasetImpl (FsDatasetImpl.java:recoverClose(1306)) - Recover failed close BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751
2017-10-30 19:05:52,248 ERROR datanode.DataNode (DataXceiver.java:writeBlock(787)) - DataNode{data=FSDataset{dirpath='[/hadoop/hdfs/data/current, /hdfs/current]'}, localName='comu5.baidu.cn:50010', datanodeUuid='2d2bf8fa-6617-43b5-a379-5d236e6c0987', xmitsInProgress=0}:Exception transfering block BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751 to mirror 10.11.12.11:50010: java.io.EOFException: Premature EOF: no length prefix available
2017-10-30 19:05:52,249 INFO datanode.DataNode (DataXceiver.java:writeBlock(850)) - opWriteBlock BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751 received exception java.io.EOFException: Premature EOF: no length prefix available
2017-10-30 19:05:52,249 ERROR datanode.DataNode (DataXceiver.java:run(278)) - comu5.baidu.cn:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.11.12.15:48177 dst: /10.11.12.15:50010
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2464)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:758)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745)
Created 10-30-2017 02:22 PM
Looks like you are running your NameNode / DataNodes using IPAddress instead of using the Hostname.
Can you please confirm If you are using Hostname (FQDN) or the IPAddress?
Created 10-30-2017 02:25 PM
Yes I am using hostname. The IP seems to be automatically translated from hostname.
Created 10-31-2017 12:38 AM
The error happens for larger file 300MB +. But I have experience uploading file > 5G as well.