Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hdfs Not Respond

Hdfs Not Respond

Explorer

Dear all,

 

On my hdfs instance I continue to find this error

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ip-10-0-11-18.eu-west-1.compute.internal:50010:DataXceiver error processing READ_BLOCK operation  src: /10.0.11.18:48672 dst: /10.0.11.18:50010

java.io.IOException: Replica gen stamp < block genstamp, block=BP-443384405-10.0.11.18-1425373333474:blk_1146735050_73143271, replica=ReplicaWaitingToBeRecovered, blk_1146735050_0, RWR

 

In this forum or other the only description is to check the network configuration. There is no iptables enabled and there is not other firewall.

It's really a problem because every 7-8 hour the system goes down 

 

Can anyone help me?

 

Thanks in advance

3 REPLIES 3

Re: Hdfs Not Respond

Explorer

I add some graphics of hdfs logto give some informations. As you can see there are some peak and sometimes it generates an increase on ACK time and other events.

it seems that 50010 ports goes down. After that the datanode restart bu it's really difficult to be used

 

Schermata 2015-08-20 alle 16.56.00.png

Re: Hdfs Not Respond

Master Guru
Could you explain further on what you mean by "System goes down"? Do all/some of your HDFS daemons crash? If so, have you checked what its log's FATAL message is, and/or checked the stdout to ensure its not a JVM crash for reasons such as OOME?
Highlighted

Re: Hdfs Not Respond

Explorer

I Think that the crash is due to JVM overHead in fact if I analyze the JVM graphic I can see that it's costantly about 500MB and in the crash moment there is a 1GB peak that causes the crash (my JVM limit is 1024MB)

 

On Solr log after the crash I can see only the subsequent error:

 

2015-08-26 15:26:09,677 WARN org.apache.solr.cloud.RecoveryStrategy: Stopping recovery for zkNodeName=core_node3core=XXXXXXXX_shard1_replica2

2015-08-26 15:26:09,709 ERROR org.apache.solr.update.UpdateLog: Exception reading versions from log

java.io.IOException: All datanodes 10.0.11.18:50010 are bad. Aborting...

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1147)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:945)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)

2015-08-26 15:26:09,710 ERROR org.apache.solr.update.UpdateLog: Exception reading versions from log

java.io.IOException: All datanodes 10.0.11.18:50010 are bad. Aborting...

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1147)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:945)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)

2015-08-26 15:26:09,710 ERROR org.apache.solr.update.UpdateLog: Exception reading versions from log

java.io.IOException: All datanodes 10.0.11.18:50010 are bad. Aborting...

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1147)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:945)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)

2015-08-26 15:26:19,666 ERROR org.apache.solr.update.UpdateLog: Exception reading versions from log

java.io.IOException: All datanodes 10.0.11.18:50010 are bad. Aborting...

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1147)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:945)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:496)

 

 

While on HDFS log I see a lot of these error

 

2015-08-26 14:28:24,542 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ip-10-0-11-18.eu-west-1.compute.internal:50010:DataXceiver error processing READ_BLOCK operation  src: /10.0.11.18:49587 dst: /10.0.11.18:50010

java.io.IOException: Need 23449 bytes, but only 23282 bytes available

at org.apache.hadoop.hdfs.server.datanode.BlockSender.waitForMinLength(BlockSender.java:442)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:235)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:474)

at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111)

at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)

at java.lang.Thread.run(Thread.java:745)

 

 

2015-08-26 14:50:19,472 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.0.11.18, datanodeUuid=5ef65e86-2a30-48b9-ad7f-35134403cd82, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=cluster22;nsid=2085326917;c=0):Got exception while serving BP-443384405-10.0.11.18-1425373333474:blk_1154030556_80513125 to /10.0.45.112:36715

java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.0.11.18:50010 remote=/10.0.45.112:36715]

at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)

at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)

at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:487)

at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111)

at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)

at java.lang.Thread.run(Thread.java:745)

 

Don't have an account?
Coming from Hortonworks? Activate your account here