Created 05-22-2018 09:54 PM
I have a 4 node cluster (2 master & 2 Data nodes) - fresh Installation.
One of the Datanode is not coming up - 2018-05-22 14:37:56,024 ERROR datanode.DataNode (BPServiceActor.java:run(780)) - Initialization failed for Block pool <registering> (DatanodeUuid unassigned) service to Host1.infosolco.net/10.215.78.41:8020. Exiting. java.io.IOException: All specified directories are failed to load.
When I see the VERSION file, the :
root@Datanode02:/spark/hdfs/data/current # cat VERSION
#Tue May 22 14:00:02 PDT 2018
storageID=DS-0009b75a-e67a-4623-b7a2-12bf395c1d61
clusterID=CID-eb6df30f-7f16-4f94-826c-c7640e1e45a2
cTime=0
datanodeUuid=f005656a-673e-4c97-b25a-e19f04e1ec94
storageType=DATA_NODE
layoutVersion=-56
__________________
root@Datanode01:/spark/hdfs/data/current # cat VERSION
#Tue May 22 14:00:02 PDT 2018
storageID=DS-0009b75a-e67a-4623-b7a2-12bf395c1d61
clusterID=CID-eb6df30f-7f16-4f94-826c-c7640e1e45a2
cTime=0
datanodeUuid=f005656a-673e-4c97-b25a-e19f04e1ec94
storageType=DATA_NODE
layoutVersion=-56
I see both datanodes have same Uuid, and 2nd data node is not coming up.
Please suggest!
Created 05-23-2018 05:49 AM
Try to perform the following steps on the failed DataNode:
$ grep -A1 dfs.datanode.data.dir /etc/hadoop/conf/hdfs-site.xml <name>dfs.datanode.data.dir</name> <value>/data0/hadoop/hdfs/data,/data1/hadoop/hdfs/data,/data2/hadoop/hdfs/data, /data3/hadoop/hdfs/data,/data4/hadoop/hdfs/data,/data5/hadoop/hdfs/data,/data6/hadoop/hdfs/data, /data7/hadoop/hdfs/data,/data8/hadoop/hdfs/data,/data9/hadoop/hdfs/data</value>
$ grep "datanodeUuid=" /var/log/hadoop/hdfs/hadoop-hdfs-datanode-$(hostname).log | head -n 1 | perl -ne '/datanodeUuid=(.*?),/ && print "$1\n"' 1dacef53-aee2-4906-a9ca-4a6629f21347
$ scp <healthy datanode host>:<dfs.datanode.data.dir>/current/VERSION ./
$ sed -i.bak -E 's|(datanodeUuid)=(.*$)|\1=1dacef53-aee2-4906-a9ca-4a6629f21347|' VERSION
$ sed -i.bak -E 's|(storageID)=(.*$)|\1=|' VERSION
$ for i in {0..9}; do cp VERSION /data$i/hadoop/hdfs/data/current/; done
$ for i in {0..9}; do chown hdfs:hdfs /data$i/hadoop/hdfs/data/current/VERSION; done $ for i in {0..9}; do chmod 664 /data$i/hadoop/hdfs/data/current/VERSION; done
/data0/hadoop/hdfs/data/current/BP-*/current/VERSIONThis file does not need to be modified -- just place then in the appropriate directories.
$ scp <healthy datanode host>:<dfs.datanode.data.dir>/current/BP-*/current/VERSION ./VERSION2 $ for i in {0..9}; do cp VERSION2 /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done
$ for i in {0..9}; do chown hdfs:hdfs /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done $ for i in {0..9}; do chmod 664 /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done
If any data is not an issue (say, for example, the node was previously in a different cluster, or was out of service for an extended time), then