Support Questions

Find answers, ask questions, and share your expertise

Ambari dashboard - "DataNodes Live" shows 1 live, 0 dead, 0 decompressioning

avatar
Explorer

We are using HDP 2.3, Ambari 2.1.2

We have connected 3 datanodes to the cluster (slave1, slave2, slave3)

Ambari server is running on a distinct server.

NameNode is running on slave1.

Dashboard shows:

DataNodes Live 1/3 (0 dead, 0 decompressioning)

NodeManagers Live 3/3

I can access all three data nodes from ambari "Hosts". I can restart the services on these data nodes.

NameNode UI shows 1 live data node (slave1 - same one with NameNode installed on).

1 ACCEPTED SOLUTION

avatar
Explorer

Thank you!

in datanode log we got the following error:

datanode.DataNode (BPServiceActor.java:run(828)) - Initialization failed for Block pool BP-1743137494-192.168.41.10-1459773600716 (Datanode Uuid 41b4525d-b168-496d-a985-a2c2b5e889c1) service to slave1/192.168.41.10:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=192.168.41.11, hostname=192.168.41.11): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=41b4525d-b168-496d-a985-a2c2b5e889c1, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-655f2b29-eb31-4c17-81b6-b507e7f6cb5f;nsid=1493722973;c=0)

The problem was in the reverse DNS resolving (a check performed by the namenode).

This article was very helpful: why-datanode-is-denied-communication-with-namenode

View solution in original post

7 REPLIES 7

avatar
Master Guru
@michael sklyar

Can you please run below commands as hdfs user and share output?

#Command 1:

hadoop dfsadmin -report

#Command 2:

hadoop dfsadmin -refreshNodes

After running above command, please check your Namenode UI to see if live nodes are 3/3. If possible, please share screenshot of NN UI.

avatar
Explorer

1. report:

Configured Capacity: 51636727808 (48.09 GB)
Present Capacity: 28807827456 (26.83 GB)
DFS Remaining: 28329484288 (26.38 GB)
DFS Used: 478343168 (456.18 MB)
DFS Used%: 1.66%
Under replicated blocks: 1023
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0


-------------------------------------------------
Live datanodes (1):


Name: 192.168.41.10:50010 (stream-srv01.streaming.rnd.qa)
Hostname: stream-srv01.streaming.rnd.qa
Decommission Status : Normal
Configured Capacity: 51636727808 (48.09 GB)
DFS Used: 478343168 (456.18 MB)
Non DFS Used: 22828900352 (21.26 GB)
DFS Remaining: 28329484288 (26.38 GB)
DFS Used%: 0.93%
DFS Remaining%: 54.86%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun May 08 20:43:40 IDT 2016



2. refresh command output: Refresh nodes successful

3. NameNode UI shows 1 live data node (the one with namenode installed on):

4088-namenode.png

avatar
Super Collaborator

Is the datanode service within each slave node appearing started in Ambari?. Also on any one of the slave node that is not working check for errors on /var/log/hadoop/hdfs/<datanodelog> file?

Also on slave 1 check namenode log file to see if the datanodes are trying to heartbeat to namenode.

Regards

Pranay Vyas

avatar
Super Guru

@michael sklyar

It might be the case that datanode process is kind of in hung state and and is not responding the status update.

Did you tried restarting the datanode service?

can you share namenode and datanode logs ?

avatar
Explorer

Thank you!

in datanode log we got the following error:

datanode.DataNode (BPServiceActor.java:run(828)) - Initialization failed for Block pool BP-1743137494-192.168.41.10-1459773600716 (Datanode Uuid 41b4525d-b168-496d-a985-a2c2b5e889c1) service to slave1/192.168.41.10:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=192.168.41.11, hostname=192.168.41.11): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=41b4525d-b168-496d-a985-a2c2b5e889c1, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-655f2b29-eb31-4c17-81b6-b507e7f6cb5f;nsid=1493722973;c=0)

The problem was in the reverse DNS resolving (a check performed by the namenode).

This article was very helpful: why-datanode-is-denied-communication-with-namenode

avatar
Contributor

I deployed a clusteron vmware and becuae I was having issues to add additional node, I have cloned a VM, changed hostname and up and added the datanode manually. As I was seeing just one datanode on dashboard, I tried making all changes as suggested here which actually didnt work. As soon as I shutdown the an active datanode , the second one started to show up. Any clues what was happening in back end?

avatar
Contributor

my assumption was correct the datanodes (prbly everynode) will have a uuid which was same and hence this issue, i removed the install software, diectories and files, then reregisterd which worked fine later