Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to write to hdfs

avatar
Contributor

I am not able to load files into hdfs. I get following error. The file size if 300MB. Also splitting files into smaller one works with sporadic error.

All datanodes DatanodeInfoWithStorage[10.11.12.11:50010,DS-835fbe86-c1f5-4967-80a4-1e84e7854425,DISK] are bad.


d-98a8-4c9c-9bcc-de6ada2d290c,DISK]: bad datanode DatanodeInfoWithStorage[10.11.12.14:50010,DS-e6fd4e6d-98a8-4c9c-9bcc-de6ada2d290c,DISK]<br>17/10/30 15:17:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream<br>java.io.EOFException: Premature EOF: no length prefix available<br>  at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2464)<br>  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1461)<br>  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1302)<br>  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:999)<br>  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:506)<br>17/10/30 15:17:45 WARN hdfs.DFSClient: Error Recovery for block BP-2139487625-10.70.12.115-1447775100056:blk_1073910746_170708 in pipeline DatanodeInfoWithStorage[10.11.12.11:50010,DS-43a97303-ce81-4953-9adb-560131c8a440,DISK], DatanodeInfoWithStorage[10.11.12.12:50010,DS-78bef79e-6a05-4118-9ccd-fe10a88df453,DISK]: bad datanode DatanodeInfoWithStorage[:50010,DS-43a97303-ce81-4953-9adb-560131c8a440,DISK]<br>17/10/30 15:18:50 INFO hdfs.DFSClient: Exception in createBlockOutputStream<br>java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.70.12.119:54084 remote=/10.70.12.118:50010]<br>  at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)<br>  at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)<br>  at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)<br>  at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)<br>  at java.io.FilterInputStream.read(FilterInputStream.java:83)<br>  at java.io.FilterInputStream.read(FilterInputStream.java:83)<br>  at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2462)<br>  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1461)<br>  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1302)<br>  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:999)<br>  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:506)<br>put: All datanodes DatanodeInfoWithStorage[10.11.12.11:50010,DS-78bef79e-6a05-4118-9ccd-fe10a88df453,DISK] are bad. Aborting...
5 REPLIES 5

avatar
Master Mentor

@Eon kitex

Can you please check and share the Complete DataNode log?

It should show the actual cause of the issue. For example if you see "too many open files" error or something in your DataNode logs?

avatar
Contributor

The log is : logs.txt

java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1590)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1525)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1438)
at java.lang.Thread.run(Thread.java:745)
2017-10-30 19:04:52,235 INFO datanode.DataNode (BlockReceiver.java:run(1449)) - PacketResponder: BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[10.11.12.11:50010, 10.11.12.12:50010]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1590)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1525)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1438)
at java.lang.Thread.run(Thread.java:745)
2017-10-30 19:04:52,236 INFO datanode.DataNode (BlockReceiver.java:run(1463)) - PacketResponder: BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[10.11.12.11:50010, 10.11.12.12:50010] terminating
2017-10-30 19:04:52,242 INFO datanode.DataNode (DataXceiver.java:writeBlock(669)) - Receiving BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751 src: /10.11.12.15:48177 dest: /10.11.12.15:50010
2017-10-30 19:04:52,243 INFO impl.FsDatasetImpl (FsDatasetImpl.java:recoverClose(1306)) - Recover failed close BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751
2017-10-30 19:05:52,248 ERROR datanode.DataNode (DataXceiver.java:writeBlock(787)) - DataNode{data=FSDataset{dirpath='[/hadoop/hdfs/data/current, /hdfs/current]'}, localName='comu5.baidu.cn:50010', datanodeUuid='2d2bf8fa-6617-43b5-a379-5d236e6c0987', xmitsInProgress=0}:Exception transfering block BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751 to mirror 10.11.12.11:50010: java.io.EOFException: Premature EOF: no length prefix available
2017-10-30 19:05:52,249 INFO datanode.DataNode (DataXceiver.java:writeBlock(850)) - opWriteBlock BP-2139487625-10.11.12.11-1447775100056:blk_1073910780_170751 received exception java.io.EOFException: Premature EOF: no length prefix available
2017-10-30 19:05:52,249 ERROR datanode.DataNode (DataXceiver.java:run(278)) - comu5.baidu.cn:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.11.12.15:48177 dst: /10.11.12.15:50010
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2464)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:758)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745)

avatar
Master Mentor

@Eon kitex

Looks like you are running your NameNode / DataNodes using IPAddress instead of using the Hostname.

Can you please confirm If you are using Hostname (FQDN) or the IPAddress?

.

avatar
Contributor

Yes I am using hostname. The IP seems to be automatically translated from hostname.

avatar
Contributor

The error happens for larger file 300MB +. But I have experience uploading file > 5G as well.