Created 08-13-2015 08:36 AM
Dear all,
On my hdfs instance I continue to find this error
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ip-10-0-11-18.eu-west-1.compute.internal:50010:DataXceiver error processing READ_BLOCK operation src: /10.0.11.18:48672 dst: /10.0.11.18:50010
java.io.IOException: Replica gen stamp < block genstamp, block=BP-443384405-10.0.11.18-1425373333474:blk_1146735050_73143271, replica=ReplicaWaitingToBeRecovered, blk_1146735050_0, RWR
In this forum or other the only description is to check the network configuration. There is no iptables enabled and there is not other firewall.
It's really a problem because every 7-8 hour the system goes down
Can anyone help me?
Thanks in advance
Created 08-20-2015 08:13 AM
I add some graphics of hdfs logto give some informations. As you can see there are some peak and sometimes it generates an increase on ACK time and other events.
it seems that 50010 ports goes down. After that the datanode restart bu it's really difficult to be used
Created 08-25-2015 06:21 AM
Created 08-26-2015 06:33 AM
I Think that the crash is due to JVM overHead in fact if I analyze the JVM graphic I can see that it's costantly about 500MB and in the crash moment there is a 1GB peak that causes the crash (my JVM limit is 1024MB)
On Solr log after the crash I can see only the subsequent error:
2015-08-26 15:26:09,677 WARN org.apache.solr.cloud.RecoveryStrategy: Stopping recovery for zkNodeName=core_node3core=XXXXXXXX_shard1_replica2
2015-08-26 15:26:09,709 ERROR org.apache.solr.update.UpdateLog: Exception reading versions from log
java.io.IOException: All datanodes 10.0.11.18:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1147)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:945)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)
2015-08-26 15:26:09,710 ERROR org.apache.solr.update.UpdateLog: Exception reading versions from log
java.io.IOException: All datanodes 10.0.11.18:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1147)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:945)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)
2015-08-26 15:26:09,710 ERROR org.apache.solr.update.UpdateLog: Exception reading versions from log
java.io.IOException: All datanodes 10.0.11.18:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1147)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:945)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)
2015-08-26 15:26:19,666 ERROR org.apache.solr.update.UpdateLog: Exception reading versions from log
java.io.IOException: All datanodes 10.0.11.18:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1147)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:945)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)
While on HDFS log I see a lot of these error
2015-08-26 14:28:24,542 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ip-10-0-11-18.eu-west-1.compute.internal:50010:DataXceiver error processing READ_BLOCK operation src: /10.0.11.18:49587 dst: /10.0.11.18:50010
java.io.IOException: Need 23449 bytes, but only 23282 bytes available
at org.apache.hadoop.hdfs.server.datanode.BlockSender.waitForMinLength(BlockSender.java:442)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:235)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:474)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
at java.lang.Thread.run(Thread.java:745)
2015-08-26 14:50:19,472 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.0.11.18, datanodeUuid=5ef65e86-2a30-48b9-ad7f-35134403cd82, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=cluster22;nsid=2085326917;c=0):Got exception while serving BP-443384405-10.0.11.18-1425373333474:blk_1154030556_80513125 to /10.0.45.112:36715
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.0.11.18:50010 remote=/10.0.45.112:36715]
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:487)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
at java.lang.Thread.run(Thread.java:745)