Support Questions

BharathN · ‎05-22-2018

I have a 4 node cluster (2 master & 2 Data nodes) - fresh Installation.

One of the Datanode is not coming up - 2018-05-22 14:37:56,024 ERROR datanode.DataNode (BPServiceActor.java:run(780)) - Initialization failed for Block pool <registering> (DatanodeUuid unassigned) service to Host1.infosolco.net/10.215.78.41:8020. Exiting. java.io.IOException: All specified directories are failed to load.

When I see the VERSION file, the :

root@Datanode02:/spark/hdfs/data/current # cat VERSION

#Tue May 22 14:00:02 PDT 2018

storageID=DS-0009b75a-e67a-4623-b7a2-12bf395c1d61

clusterID=CID-eb6df30f-7f16-4f94-826c-c7640e1e45a2

cTime=0

datanodeUuid=f005656a-673e-4c97-b25a-e19f04e1ec94

storageType=DATA_NODE

layoutVersion=-56

__________________

root@Datanode01:/spark/hdfs/data/current # cat VERSION

#Tue May 22 14:00:02 PDT 2018

storageID=DS-0009b75a-e67a-4623-b7a2-12bf395c1d61

clusterID=CID-eb6df30f-7f16-4f94-826c-c7640e1e45a2

cTime=0

datanodeUuid=f005656a-673e-4c97-b25a-e19f04e1ec94

storageType=DATA_NODE

layoutVersion=-56

I see both datanodes have same Uuid, and 2nd data node is not coming up.

Please suggest!

bandarusridhar1 · ‎05-23-2018

@Bharath N

Try to perform the following steps on the failed DataNode:

Get the list of DataNode directories from /etc/hadoop/conf/hdfs-site.xml using the following command:

$ grep -A1 dfs.datanode.data.dir /etc/hadoop/conf/hdfs-site.xml
      <name>dfs.datanode.data.dir</name>
      <value>/data0/hadoop/hdfs/data,/data1/hadoop/hdfs/data,/data2/hadoop/hdfs/data,
/data3/hadoop/hdfs/data,/data4/hadoop/hdfs/data,/data5/hadoop/hdfs/data,/data6/hadoop/hdfs/data,
/data7/hadoop/hdfs/data,/data8/hadoop/hdfs/data,/data9/hadoop/hdfs/data</value>

Get datanodeUuid by grepping the DataNode log:

$ grep "datanodeUuid=" /var/log/hadoop/hdfs/hadoop-hdfs-datanode-$(hostname).log | head -n 1 | 
perl -ne '/datanodeUuid=(.*?),/ && print "$1\n"'
1dacef53-aee2-4906-a9ca-4a6629f21347

Copy over a VERSION file from one of the <dfs.datanode.data.dir>/current/ directories of a healthy running DataNode:
```
$ scp <healthy datanode host>:<dfs.datanode.data.dir>/current/VERSION ./
```
Modify the datanodeUuid in the VERSION file with the datanodeUuid from the above grep search:
```
$ sed -i.bak -E 's|(datanodeUuid)=(.*$)|\1=1dacef53-aee2-4906-a9ca-4a6629f21347|' VERSION
```
Blank out the storageID= property in the VERSION file:
```
$ sed -i.bak -E 's|(storageID)=(.*$)|\1=|' VERSION
```
Copy this modified VERSION file to the current/ path of every directory listed in dfs.datanode.data.dir property of hdfs-site.xml:
```
$ for i in {0..9}; do cp VERSION /data$i/hadoop/hdfs/data/current/; done
```

Change permissions on this VERSION file to be owned by hdfs:hdfs with permissions 644:

$ for i in {0..9}; do chown hdfs:hdfs /data$i/hadoop/hdfs/data/current/VERSION; done
$ for i in {0..9}; do chmod 664 /data$i/hadoop/hdfs/data/current/VERSION; done

One more level down, there is a different VERSION file located under the Block Pool current folder at:
```
/data0/hadoop/hdfs/data/current/BP-*/current/VERSION
```
This file does not need to be modified -- just place then in the appropriate directories.

Copy over this particular VERSION file from a healthy DataNode into the current/BP-*/current/ folder for each directory listed in dfs.datanode.data.dir of hdfs-site.xml:

$ scp <healthy datanode host>:<dfs.datanode.data.dir>/current/BP-*/current/VERSION ./VERSION2
$ for i in {0..9}; do cp VERSION2 /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done

Change permissions on this VERSION file to be owned by hdfs:hdfs with permissions 644:

$ for i in {0..9}; do chown hdfs:hdfs /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done
$ for i in {0..9}; do chmod 664 /data$i/hadoop/hdfs/data/current/BP-*/current/VERSION; done

Restart DataNode from Ambari.
The VERSION file located at <dfs.datanode.data.dir>/current/VERSION will have its storageID repopulated with a regenerated ID.

If any data is not an issue (say, for example, the node was previously in a different cluster, or was out of service for an extended time), then

delete all data and directories in the dfs.datanode.data.dir (keep that directory, though),
restart the data node daemon or servic

Cloudera Community

Support Questions

Databode uuid unassigned

Search for UUID

Using java.util.UUID.randomUUID() for UUID generat...

Store uuid in a another attribute

NavEncrypt / Use UUID in ztab?

NiFi GetHTTP processor's uuid attribute

Generate Separate UUID for JSON array of objects

Testing CM Installation: PiEstimator pending/unass...

HDFS Datanode Uuid unassigned error : after 2 days...

Move data gives Found duplicated storage UUID erro...

Problem with trasform UUID to Strring in NIFI