Support Questions

Find answers, ask questions, and share your expertise

Databode uuid unassigned

avatar
Explorer

I have a 4 node cluster (2 master & 2 Data nodes) - fresh Installation.

One of the Datanode is not coming up - 2018-05-22 14:37:56,024 ERROR datanode.DataNode (BPServiceActor.java:run(780)) - Initialization failed for Block pool <registering> (DatanodeUuid unassigned) service to Host1.infosolco.net/10.215.78.41:8020. Exiting. java.io.IOException: All specified directories are failed to load.

When I see the VERSION file, the :

root@Datanode02:/spark/hdfs/data/current # cat VERSION

#Tue May 22 14:00:02 PDT 2018

storageID=DS-0009b75a-e67a-4623-b7a2-12bf395c1d61

clusterID=CID-eb6df30f-7f16-4f94-826c-c7640e1e45a2

cTime=0

datanodeUuid=f005656a-673e-4c97-b25a-e19f04e1ec94

storageType=DATA_NODE

layoutVersion=-56

__________________

root@Datanode01:/spark/hdfs/data/current # cat VERSION

#Tue May 22 14:00:02 PDT 2018

storageID=DS-0009b75a-e67a-4623-b7a2-12bf395c1d61

clusterID=CID-eb6df30f-7f16-4f94-826c-c7640e1e45a2

cTime=0

datanodeUuid=f005656a-673e-4c97-b25a-e19f04e1ec94

storageType=DATA_NODE

layoutVersion=-56

I see both datanodes have same Uuid, and 2nd data node is not coming up.

Please suggest!

1 REPLY 1

avatar

@Bharath N

Try to perform the following steps on the failed DataNode:

  1. Get the list of DataNode directories from /etc/hadoop/conf/hdfs-site.xml using the following command:
    $ grep -A1 dfs.datanode.data.dir /etc/hadoop/conf/hdfs-site.xml
          <name>dfs.datanode.data.dir</name>
          <value>/data0/hadoop/hdfs/data,/data1/hadoop/hdfs/data,/data2/hadoop/hdfs/data,
    /data3/hadoop/hdfs/data,/data4/hadoop/hdfs/data,/data5/hadoop/hdfs/data,/data6/hadoop/hdfs/data,
    /data7/hadoop/hdfs/data,/data8/hadoop/hdfs/data,/data9/hadoop/hdfs/data</value>
  2. Get datanodeUuid by grepping the DataNode log:
    $ grep "datanodeUuid=" /var/log/hadoop/hdfs/hadoop-hdfs-datanode-$(hostname).log | head -n 1 | 
    perl -ne '/datanodeUuid=(.*?),/ && print "$1\n"'
    1dacef53-aee2-4906-a9ca-4a6629f21347
  3. Copy over a VERSION file from one of the <dfs.datanode.data.dir>/current/ directories of a healthy running DataNode:
    $ scp <healthy datanode host>:<dfs.datanode.data.dir>/current/VERSION ./
  4. Modify the datanodeUuid in the VERSION file with the datanodeUuid from the above grep search:
    $ sed -i.bak -E 's|(datanodeUuid)=(.*$)|\1=1dacef53-aee2-4906-a9ca-4a6629f21347|' VERSION
  5. Blank out the storageID= property in the VERSION file:
    $ sed -i.bak -E 's|(storageID)=(.*$)|\1=|' VERSION
  6. Copy this modified VERSION file to the current/ path of every directory listed in dfs.datanode.data.dir property of hdfs-site.xml:
    $ for i in {0..9}; do cp VERSION /data$i/hadoop/hdfs/data/current/; done
  7. Change permissions on this VERSION file to be owned by hdfs:hdfs with permissions 644:
    $ for i in {0..9}; do chown hdfs:hdfs /data$i/hadoop/hdfs/data/current/VERSION; done
    $ for i in {0..9}; do chmod 664 /data$i/hadoop/hdfs/data/current/VERSION; done
  8. One more level down, there is a different VERSION file located under the Block Pool current folder at:
    /data0/hadoop/hdfs/data/current/BP-*/current/VERSION
    This file does not need to be modified -- just place then in the appropriate directories.
  9. Copy over this particular VERSION file from a healthy DataNode into the current/BP-*/current/ folder for each directory listed in dfs.datanode.data.dir of hdfs-site.xml:
    $ scp <healthy datanode host>:<dfs.datanode.data.dir>/current/BP-*/current/VERSION ./VERSION2
    $ for i in {0..9}; do cp VERSION2 /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done
  10. Change permissions on this VERSION file to be owned by hdfs:hdfs with permissions 644:
    $ for i in {0..9}; do chown hdfs:hdfs /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done
    $ for i in {0..9}; do chmod 664 /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done
  11. Restart DataNode from Ambari.
  12. The VERSION file located at <dfs.datanode.data.dir>/current/VERSION will have its storageID repopulated with a regenerated ID.

If any data is not an issue (say, for example, the node was previously in a different cluster, or was out of service for an extended time), then

  • delete all data and directories in the dfs.datanode.data.dir (keep that directory, though),
  • restart the data node daemon or servic