Created 11-03-2016 10:56 AM
Using Ambari on a perfectly running and working HDP cluster (v 2.4), tried to add a new host with Datanode service.
However on starting the new DN it starts only for 2-3 seconds and then stops.
Ambari also shows the "DataNodes Live" widget as 3/4.
PS-Have used the same DataNode directories setting on all the 3 existing as well as the newly added node.
Following are the logs from the DN that is not starting:
[root@node08 data]# tail -50 /var/log/hadoop/hdfs/hadoop-hdfs-datanode-node08.log 2016-11-03 06:29:01,769 INFO ipc.Server (Server.java:run(906)) - IPC Server Responder: starting 2016-11-03 06:29:01,769 INFO ipc.Server (Server.java:run(746)) - IPC Server listener on 8010: starting 2016-11-03 06:29:02,001 INFO common.Storage (Storage.java:tryLock(715)) - Lock on /hadoopdisk/hadoop/hdfs/data/in_use.lock acquired by nodename 9074@node08.int.xyz.com 2016-11-03 06:29:02,048 INFO common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(241)) - Analyzing storage directories for bpid BP-1435709756-10.131.138.24-1461308845727 2016-11-03 06:29:02,049 INFO common.Storage (Storage.java:lock(675)) - Locking is disabled for /hadoopdisk/hadoop/hdfs/data/current/BP-1435709756-10.131.138.24-1461308845727 2016-11-03 06:29:02,051 INFO datanode.DataNode (DataNode.java:initStorage(1402)) - Setting up storage: nsid=1525277556;bpid=BP-1435709756-10.131.138.24-1461308845727;lv=-56;nsInfo=lv=-63;cid=CID-95c88273-9764-4b48-8453-8cbc07cffc8b;nsid=1525277556;c=0;bpid=BP-1435709756-10.131.138.24-1461308845727;dnuuid=c06d42e7-c0be-458c-a494-015e472b3b49 2016-11-03 06:29:02,065 INFO common.Storage (DataStorage.java:addStorageLocations(379)) - Storage directory [DISK]file:/hadoopdisk/hadoop/hdfs/data/ has already been used. 2016-11-03 06:29:02,100 INFO common.Storage (BlockPoolSliceStorage.java:recoverTransitionRead(241)) - Analyzing storage directories for bpid BP-1435709756-10.131.138.24-1461308845727 2016-11-03 06:29:02,101 WARN common.Storage (BlockPoolSliceStorage.java:loadBpStorageDirectories(219)) - Failed to analyze storage directories for block pool BP-1435709756-10.131.138.24-1461308845727 java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /hadoopdisk/hadoop/hdfs/data/current/BP-1435709756-10.131.138.24-1461308845727 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:210) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:242) at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:394) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:476) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1399) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821) at java.lang.Thread.run(Thread.java:745) 2016-11-03 06:29:02,103 WARN common.Storage (DataStorage.java:addStorageLocations(397)) - Failed to add storage for block pool: BP-1435709756-10.131.138.24-1461308845727 : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /hadoopdisk/hadoop/hdfs/data/current/BP-1435709756-10.131.138.24-1461308845727 2016-11-03 06:29:02,104 FATAL datanode.DataNode (BPServiceActor.java:run(833)) - Initialization failed for Block pool <registering> (Datanode Uuid c06d42e7-c0be-458c-a494-015e472b3b49) service to node04.int.xyz.com/10.131.138.27:8020. Exiting. java.io.IOException: All specified directories are failed to load. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1399) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821) at java.lang.Thread.run(Thread.java:745) 2016-11-03 06:29:02,104 FATAL datanode.DataNode (BPServiceActor.java:run(833)) - Initialization failed for Block pool <registering> (Datanode Uuid c06d42e7-c0be-458c-a494-015e472b3b49) service to node03.int.xyz.com/10.131.138.24:8020. Exiting. org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure config value: 1 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:285) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1412) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821) at java.lang.Thread.run(Thread.java:745) 2016-11-03 06:29:02,104 WARN datanode.DataNode (BPServiceActor.java:run(854)) - Ending block pool service for: Block pool <registering> (Datanode Uuid c06d42e7-c0be-458c-a494-015e472b3b49) service to node04.int.xyz.com/10.131.138.27:8020 2016-11-03 06:29:02,104 WARN datanode.DataNode (BPServiceActor.java:run(854)) - Ending block pool service for: Block pool <registering> (Datanode Uuid c06d42e7-c0be-458c-a494-015e472b3b49) service to node03.int.xyz.com/10.131.138.24:8020 2016-11-03 06:29:02,208 INFO datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid c06d42e7-c0be-458c-a494-015e472b3b49) 2016-11-03 06:29:04,208 WARN datanode.DataNode (DataNode.java:secureMain(2540)) - Exiting Datanode 2016-11-03 06:29:04,212 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0 2016-11-03 06:29:04,214 INFO datanode.DataNode (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at node08.int.xyz.com/10.131.137.96 ************************************************************/
Created 11-06-2016 02:18 PM
Here's how I got it working-
The existing nodes had DN directory as /hadoopdisk and the new nodes had /newdisk (using profiles in Ambari config. able to override properties selectively)
However later on I reverted the DN directory for all the nodes as /hadoopdisk and thats when I started to get the error log above.
Resolution was that I removed the unused /newdisk directories from the new DNs. Not sure why in first place this seemed to cause any issues as the DN property was /hadoopdisk only.
It seems somehow that the old DN property was causing the issue (in spite of being reverted back) till the time the unused directory existed. As soon as it was removed, voila !!
Created 11-03-2016 11:00 AM
Have already checked that ClusterIds for NN/DN are matching. Also have tried to delete the DN data folder and recreated using 'hdfs datanode'.
Created 11-04-2016 12:22 AM
Can you check if the file system is clean? run fsck -y yourdisk/mountpoints
Created 11-04-2016 09:27 AM
Its resolved now.
Created 11-04-2016 10:41 AM
@AT can you provide a solution? Otherwise this is not really useful.
Created 11-06-2016 02:18 PM
Here's how I got it working-
The existing nodes had DN directory as /hadoopdisk and the new nodes had /newdisk (using profiles in Ambari config. able to override properties selectively)
However later on I reverted the DN directory for all the nodes as /hadoopdisk and thats when I started to get the error log above.
Resolution was that I removed the unused /newdisk directories from the new DNs. Not sure why in first place this seemed to cause any issues as the DN property was /hadoopdisk only.
It seems somehow that the old DN property was causing the issue (in spite of being reverted back) till the time the unused directory existed. As soon as it was removed, voila !!
Created 11-06-2016 09:27 PM
Thanks a million for sharing keep the spirit