Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DataNode stopped and not starting now with - Failed to add storage for block pool

Highlighted

DataNode stopped and not starting now with - Failed to add storage for block pool

New Contributor

One datanode went down and while starting it failing with following errors:

WARN common.Storage (DataStorage.java:addStorageLocations(399)) - Failed to add storage for block pool: BP-441779837-135.208.32.109-1458040734038 : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /opt/app/data11/hadoop/hdfs/data/current/BP-441779837-135.208.32.109-1458040734038

FATAL datanode.DataNode (BPServiceActor.java:run(878)) - Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to <HOST/IP>:8020. Exiting.

java.io.IOException: All specified directories are failed to load. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1336) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1301) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:866) at java.lang.Thread.run(Thread.java:745)

FATAL datanode.DataNode (BPServiceActor.java:run(878)) - Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to <HOST/IP>:8020. Exiting.

org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 10, volumes configured: 11, volumes failed: 1, volume failures tolerated: 0 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:261) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1349) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1301) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:866) at java.lang.Thread.run(Thread.java:745)

WARN datanode.DataNode (BPServiceActor.java:run(899)) - Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to <HOST/IP>:8020

WARN datanode.DataNode (BPServiceActor.java:run(899)) - Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to <HOST/IP>:8020

INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid unassigned)

WARN datanode.DataNode (DataNode.java:secureMain(2417)) - Exiting Datanode

INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0

INFO datanode.DataNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:

3 REPLIES 3
Highlighted

Re: DataNode stopped and not starting now with - Failed to add storage for block pool

Super Collaborator

the message could be caused by a process still or already accessing the file. Try to check if this is the case by:

lsof | grep /opt/app/data11/hadoop/hdfs/data/current/BP-441779837-135.208.32.109-1458040734038

The first three columns are:

  • command
  • process id
  • user

If there is a process locking the file, this should help you to identify it.

Re: DataNode stopped and not starting now with - Failed to add storage for block pool

Super Collaborator

One question: have you performed an upgrade of HDFS?
You may also want to check with:

hdfs fsck / -includeSnapshots
Highlighted

Re: DataNode stopped and not starting now with - Failed to add storage for block pool

New Contributor

Thanks Harald for your inputs.

While investigating further, we found that one disk on this datanode host was not healthy (was read_only) . After replacing disk, issue was resolved. Disk tolerance was also set to 0 on cluster due to this data node was not getting up.

We didn't performed any upgrade recently.

Don't have an account?
Coming from Hortonworks? Activate your account here