Created 11-20-2017 10:34 PM
we tried to start the data node and we get the following errors down ,
then we try to solve it by we removed the /.../sdXX/hadoop/hdfs/data on all partitions and recreate the data again but with the same errors
please advice what need to fix in our worker machine to start the datanode corectly?
2017-11-20 22:20:44,907 WARN common.Storage (DataStorage.java:loadBlockPoolSliceStorage(502)) - Failed to add storage directory [DISK]file:/wrk/sdd/hadoop/hdfs/data/ for block pool BP-2098469986-109.64.28.153-1497173237387 java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /wrk/sdd/hadoop/hdfs/data/current/BP-2098469986-109.64.28.153-1497173237387 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:218) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:251) at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:490) at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:419) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:595) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1543) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1504) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:269) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:760) at java.lang.Thread.run(Thread.java:745) 2017-11-20 22:20:44,971 INFO common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(250)) - Analyzing storage directories for bpid BP-2098469986-109.64.28.153-1497173237387 2017-11-20 22:20:44,971 WARN common.Storage (BlockPoolSliceStorage.java:loadBpStorageDirectories(227)) - Failed to analyze storage directories for block pool BP-2098469986-109.64.28.153-1497173237387 java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /wrk/sde/hadoop/hdfs/data/current/BP-2098469986-109.64.28.153-1497173237387 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:218) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:251) at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:490) at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:419) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:595) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1543) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1504) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:269) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:760) at java.lang.Thread.run(Thread.java:745) 2017-11-20 22:20:44,972 WARN common.Storage (DataStorage.java:loadBlockPoolSliceStorage(502)) - Failed to add storage directory [DISK]file:/wrk/sde/hadoop/hdfs/data/ for block pool BP-2098469986-109.64.28.153-1497173237387 java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /wrk/sde/hadoop/hdfs/data/current/BP-2098469986-109.64.28.153-1497173237387 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:218) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:251) at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:490) at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:419) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:595) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1543) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1504) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:269) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:760) at java.lang.Thread.run(Thread.java:745) 2017-11-20 22:20:44,972 ERROR datanode.DataNode (BPServiceActor.java:run(772)) - Initialization failed for Block pool <registering> (Datanode Uuid ad0af75b-e973-475b-b525-52974df91fd1) service to master03.sys774.com/10.164.27.162:8020. Exiting. java.io.IOException: All specified directories are failed to load. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:596) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1543) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1504) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:269) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:760) at java.lang.Thread.run(Thread.java:745) 2017-11-20 22:20:44,973 WARN datanode.DataNode (BPServiceActor.java:run(796)) - Ending block pool service for: Block pool <registering> (Datanode Uuid ad0af75b-e973-475b-b525-52974df91fd1) service to master03.sys774.com/10.164.27.162:8020 2017-11-20 22:20:44,973 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid ad0af75b-e973-475b-b525-52974df91fd1) 2017-11-20 22:20:46,974 WARN datanode.DataNode (DataNode.java:secureMain(2698)) - Exiting Datanode 2017-11-20 22:20:46,984 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0 2017-11-20 22:20:46,990 INFO datanode.DataNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at worker05.sys774.com/192.98.12.34 ************************************************************/
Created 11-21-2017 11:52 AM
If it is only one datanode failing but not all of them, it may be failing due to a volume failure.
You can check Namenode web-ui to see if you are facing any volume failures:
http://<active-namenode-host>:50070/dfshealth.html#tab-datanode-volume-failures
or just
http://<active-namenode-host>:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState
For plain JMX data to see whether or not you're having volume failures (it is not physical disk error all the time. It may occur after logical write issues).