Community Articles

elserj · ‎03-10-2017

A common error to see in initial installations is the following from Accumulo TabletServer logs

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/accumulo/data/wal/myhost.mydomain.com+9997/1ff916a2-13d0-4bb7-aa38-c44b69831519 could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and no node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1649)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:843)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
    at org.apache.hadoop.ipc.Client.call(Client.java:1496)
    at org.apache.hadoop.ipc.Client.call(Client.java:1396)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

This exception will be printed repeatedly in the TabletServer logs as Accumulo has no other solution than to try to create its write-ahead log file again.

This exception is, indirectly, telling us multiple things about the current state:

There are three Datanodes
None of the Datanodes were avoided -- this means all three of them should have been able to accept the write
None of the Datanodes successfully accepted the write

The most common cause of this issue is that each Datanode has a very small amount of disk space to use. When Accumulo creates its write-ahead log files, it sets a large HDFS block size (by default: 1GB). If the Datanode does not have enough free space to store 1GB of data, the allocation fails. When all of the Datanodes are in this situation, you would see the above error message.

The solution to the above problem is to provide more storage for the Datanode. Commonly, this is because HDFS is not configured to use the correct data directories or some hard drives were not mounted to the data dirs (and thus the Datanodes are using the root volume).

Cloudera Community

Community Articles

Accumulo write-ahead log file could only be replicated to 0 nodes instead of minReplication (=1)

Apache Accumulo

Apache Hadoop