Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NameNode Savemode OR Missing_blocks + Under_replicated_blocks

NameNode Savemode OR Missing_blocks + Under_replicated_blocks

New Contributor

Hi,

 

After an accidental power-off, one of slave node in my cluster (include 3 nodes, one master and two slaves, slave01 falled) cannot be booted. It "contains a file system with errors, check forced" so I went through this solution using "fsck -f ..."

https://askubuntu.com/questions/955467/dev-sda1-contains-a-file-system-with-errors-check-forced

"fsck -f ..." fixed several files and the desktop came bcak.

 

However, after restart cloudera manager, theNameNode falled into safe mode. Then I turned off safe mode manually. Two errors came out: Missing_blocks + Under_replicated_blocks. They claimed that 99.999% of blocks in the cluster are missing. And 99.999% of blocks in the cluster are required to be replicated. If I restart CM the save mode comes back.

 

Then I checked the logs from the namenode and both datanodes:

In namenode log:

7:02:29.620 PMWARNServer
Requested data length 69250013 is longer than maximum configured RPC length 67108864.  RPC came from 192.168.1.102
7:02:29.621 PMINFOServer
Socket Reader #1 for port 8022: readAndProcess from client 192.168.1.102 threw exception [java.io.IOException: Requested data length 69250013 is longer than maximum configured RPC length 67108864.  RPC came from 192.168.1.102]
java.io.IOException: Requested data length 69250013 is longer than maximum configured RPC length 67108864.  RPC came from 192.168.1.102
	at org.apache.hadoop.ipc.Server$Connection.checkDataLength(Server.java:1610)
	at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1672)
	at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:896)
	at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:752)
	at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:723)
7:02:30.167 PMWARNServer
Requested data length 69251091 is longer than maximum configured RPC length 67108864.  RPC came from 192.168.1.103
7:02:30.167 PMINFOServer
Socket Reader #1 for port 8022: readAndProcess from client 192.168.1.103 threw exception [java.io.IOException: Requested data length 69251091 is longer than maximum configured RPC length 67108864.  RPC came from 192.168.1.103]
java.io.IOException: Requested data length 69251091 is longer than maximum configured RPC length 67108864.  RPC came from 192.168.1.103
	at org.apache.hadoop.ipc.Server$Connection.checkDataLength(Server.java:1610)
	at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1672)
	at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:896)
	at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:752)
	at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:723)

 

in datanode slave02 which didn't fall:

7:36:16.878 PMINFODataNode
Unsuccessfully sent block report 0x11a70b7faba74214,  containing 1 storage report(s), of which we sent 0. The reports had 5918818 total blocks and used 0 RPC(s). This took 283 msec to generate and 106 msecs for RPC and NN processing. Got back no commands.
7:36:16.878 PMWARNDataNode
IOException in offerService
java.io.EOFException: End of File Exception between local host is: "slave02/192.168.1.103"; destination host is: "master":8022; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
	at sun.reflect.GeneratedConstructorAccessor9.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
	at org.apache.hadoop.ipc.Client.call(Client.java:1508)
	at org.apache.hadoop.ipc.Client.call(Client.java:1441)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy23.blockReport(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:204)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:323)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:561)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)

 

in datanode slave01 which falled:

7:38:47.747 PMINFODataNode
Unsuccessfully sent block report 0x519b781f0b8dd8ed,  containing 1 storage report(s), of which we sent 0. The reports had 5918715 total blocks and used 0 RPC(s). This took 496 msec to generate and 100 msecs for RPC and NN processing. Got back no commands.
7:38:47.747 PMWARNDataNode
IOException in offerService
java.io.EOFException: End of File Exception between local host is: "slave01/192.168.1.102"; destination host is: "master":8022; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
	at sun.reflect.GeneratedConstructorAccessor9.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
	at org.apache.hadoop.ipc.Client.call(Client.java:1508)
	at org.apache.hadoop.ipc.Client.call(Client.java:1441)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy23.blockReport(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:204)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:323)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:561)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)

 

 

 

The above events repeat all the time.

 

As my understanding the system wants to replicate the missing block but it cannot success because:

Requested data length 69250013 (this number is a variate of block) is longer than maximum configured RPC length 67108864. 

 

I checked online and someone says I should change a configuration "ipc.maximum.data.length" in core-default.xml (https://community.hortonworks.com/questions/101841/issue-requested-data-length-146629817-is-longer-t...)

But I'm using CDH 5.13 with hadoop 2.6. "ipc.maximum.data.length" is introduced from hadoop 2.8. So I can't find it in CM configuration pages.

Can I add this property myself to somewhere for the namenode? or for the entire hdfs? How and where can I add it?

 

Then I find another similar asked in our community from RakeshEhttps://community.cloudera.com/t5/Storage-Random-Access-HDFS/ISSUE-Requested-data-length-146629817-i...

 

The solution given by weichiu says the problem cannot be solved by adjust "ipc.maximum.data.length". We should delete small files to decrease block count and balance. I also have around 6 Million blocks. But I should firstly have the ability to read and write them out before I can delete them.

 

Please give me some suggestions on what should I do to fix the cluster. Thanks in advance!

 

2 REPLIES 2

Re: NameNode Savemode OR Missing_blocks + Under_replicated_blocks

Rising Star

Hi,

 

 

Did you tried adding ipc parameter in core-site.xml file:

 

<property>

        <name>ipc.maximum.data.length</name>

       <value>134217728</value>

</property> 

 

Then restart your agent.

Re: NameNode Savemode OR Missing_blocks + Under_replicated_blocks

Explorer
In CDH 5.14 add the property work ! thanks