Support Questions

tabata · ‎11-25-2015

Hi,

Thank you for releasing CDH 5.5.0.

This time, I upgraded our hadoop cluster to CDH 5.5.0 from CDH 5.4.8, and HDFS got CORRUPTED.

In datanode log, datanode seemed to fail to send a block report by protobuf size limitation like below.

In CDH 5.5.0, BlockListAsLongs was changed largely and this seems to be cause.

Accoding to "Install and Upgrade Known Issues", ipc.maximum.data.length is 134217728 because I've encountered maximum configured RPC length error.

Can I solve this problem without hacking protobuf jar?

[datanode protobuf size limit error log]

2015-11-25 22:30:15,259 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x557774ffa491b8ce,  containing 1 storage report(s), of which we sent 0. The reports had 5822491 total blocks and used 0 RPC(s). This took 518 msec to generate and 3776 msecs for RPC and NN processing. Got back no commands.
2015-11-25 22:30:15,259 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService
org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase the size limit.
	at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:332)
	at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:310)
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processFirstBlockReport(BlockManager.java:2108)
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1852)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1191)
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:164)
	at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28853)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase the size limit.
	at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
	at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
	at com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
	at com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
	at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:328)
	... 14 more

	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
	at org.apache.hadoop.ipc.Client.call(Client.java:1403)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at com.sun.proxy.$Proxy14.blockReport(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:201)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:470)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:707)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:847)
	at java.lang.Thread.run(Thread.java:745)

tabata · ‎11-26-2015

Referring HDFS-5153, I took workaround.

https://issues.apache.org/jira/browse/HDFS-5153

One of the couses is that I set dfs.datanode.data.dir to only one directory.

So, I increased by several storages and rebalanced data manually.

And then, I could boot hdfs with no currupt.

However, if this is unexpected, I hope the block report function to be fixed.

And another better solution is welcome.

View solution in original post

tabata · ‎11-26-2015