Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Super Guru

A common error to see in initial installations is the following from Accumulo TabletServer logs

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/accumulo/data/wal/myhost.mydomain.com+9997/1ff916a2-13d0-4bb7-aa38-c44b69831519 could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and no node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1649)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:843)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
    at org.apache.hadoop.ipc.Client.call(Client.java:1496)
    at org.apache.hadoop.ipc.Client.call(Client.java:1396)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

This exception will be printed repeatedly in the TabletServer logs as Accumulo has no other solution than to try to create its write-ahead log file again.

This exception is, indirectly, telling us multiple things about the current state:

  1. There are three Datanodes
  2. None of the Datanodes were avoided -- this means all three of them should have been able to accept the write
  3. None of the Datanodes successfully accepted the write

The most common cause of this issue is that each Datanode has a very small amount of disk space to use. When Accumulo creates its write-ahead log files, it sets a large HDFS block size (by default: 1GB). If the Datanode does not have enough free space to store 1GB of data, the allocation fails. When all of the Datanodes are in this situation, you would see the above error message.

The solution to the above problem is to provide more storage for the Datanode. Commonly, this is because HDFS is not configured to use the correct data directories or some hard drives were not mounted to the data dirs (and thus the Datanodes are using the root volume).

740 Views
0 Kudos