Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

DATA NODE was removed from Ambari - Due to Rackwareness topology Error

avatar
Contributor

Good Morning Experts,

I am currently facing the following issue:

We have 200 nodes hadoop cluster and we configured rack awareness, suddenly we noticed one of the datanode was missing from Ambari, but we do have that datanode process on that particular node, when we looked at the logs, we have noticed the following error:

Initialization failed for Block pool BP-3x84848-92929299 (Datanode Uuid 6048438486-d001-47af-a899-6493aca15c4c) service to hostname.com/<data node ip>:8020 Failed to add /default-rack/<datanode ip>:1019: You cannot have a rack and a non-rack node at the same level of the network topology.

We have added the datanode again from ambari, but after starting the datanode, but it is still complaining with the above errors in the datanode logs. I didn't see any similar question in the community, so i am looking for your help.

Since this is currently an issue in our production cluster, can anyone please help me quickly?

I greatly appreciate your quick help.

thanks,

~hdpadmin

3 REPLIES 3

avatar

Check your rack setting for the DataNode. If you don't see the problem you can post the output of the following command and someone may be able to point out the error.

hdfs dfsadmin -report

avatar
Contributor

Hi Arpit, Thank you for the response. Actually our issue was resolved after refreshing client configs on the Namenode Host. Looks like the Namenode has cached the old configuration for DN and we were asked by HW support to Restart Namenode (or) if not possible, atleast refresh client config, we first refreshed client config that resolved our issue.

avatar
Cloudera Employee

Hi @hdpadmin overlandpark Thanks for working with me over the Support portal. I will put the steps here for others reference:

The Topology information is CACHED in NN and so, when ever there is change to Topology we need to restart the NN to clear it.

In our case Datanode is originally registered without rack (default-rack) and due to caching, NN will not allow it to join when its online.

So to fix this issue, we have to restart namenode, which will pick up the current updated rack information.

To avoid a NN restart, there is a workaround to add a new DataNode to a specific Rack without the need of restarting the NameNode:

a. Add the New Node in Ambari without choosing 'DataNode' component

b. From Ambari Hosts tab, Select the new Node and add 'DataNode' component

c. Click 'Host Actions' -> 'Set Rack' and specify the required Rack Name

d. From the Ambari Hosts tab, go to the NameNode Host and Refresh the client configs by dropping down the Menu next to Clients. This step will update /etc/hadoop/conf/topology_mappings.data on the NameNode with the new topology information for the DN.

e. Start the DataNode service on the new Node.

f. Once the Datanode is up, confirm its Rack topology information, by running the following command on any node with hdfs client:

# su - hdfs

# hdfs dfsadmin -report

In short, the Topology should be updated before a Datanode is started (at which point it tries to register with the NN)