Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

all data nodes are up but in dashboard we see only 3 from 5 are up

avatar

42738-capture.png

42739-capture1.png

in our Amabri cluster we see that: all data nodes are up but in dashboard we see only 3 from 5 are up

so - how / why dashboard see only 3 from 5 ?

what need to check or sync here?

* just want to say that two hosts ( workers machine ) was added recently to the ambari cluster , any way we restart the ambari-agent and reboot these servers , but still the status on dasboard is 3/5

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

DataNode Issue:

Ambari Server fetches some informations from the NameNode. Like the DataNode status. So as we see that the NameNode is saying only 3 DataNodes are Live means other 2 DataNodes are not able to communicate properly with the NameNode. Even though those DataNode might be running (means even if the DataNodes will be running and might be having a valid PID file) they are not communicating fine with the NameNode and hence ambari is just showing the information which it is getting from NameNode.

.

So at this point we can say that there is no issue from ambari side and it is showing the info about the Live DataNodes, what it is getting form the NameNode.

.

So in order to investigate why those DataNodes are not communicating fine with nameNode (Why Name Node is not showing all 5 nodes as Live) we will have to look at the NameNode log as well as the DataNode logs of the problematic DataNodes.

.

Regarding the Agent communication with Ambari Server:

Unable to reconnect to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com

.

Please check if those hosts are resolving the Ambari Server hostname & IP Address properly? Pleas check the "/etc/hosts" file entry of those hosts to verify if the ambari host is resolving fine.

Also please check if there is any port blockage OR Firewall issue in communicating to ambari server port 8441 frm those hosts?

# cat /etc/hosts
# nc -v master02 8441
(OR)
# telnet master02 8441

.

Please confirm that the "master02" is actually your Ambari Server host? If not then please check the "/etc/ambari-agent/conf/ambari-agent.ini" file to verify if the Ambari Hostname is correctly mentioned there?

View solution in original post

21 REPLIES 21

avatar
Master Mentor

@Michael Bronson

On the other two hosts where we see that the DataNodes are not showing live please check if the Hostname is in Lowercase for those hosts. Sometimes mixed case Hostname / Uppercase hostname causes issues in determining the state of the HDFS component.

Also please remove the DataNode PID file from the "/var/run/hadoop/" directory and then try to restart the DataNode once again

Do you see any strange erorr/warning in the ambari-agent log or the Datanode of the problematic DataNode hosts?

.

Sometimes it can happen when few DataNodes are not able to communicate to the NameNode, So please check the NameNode UI to see what information do you see there. This is just to isolate the issue if it is from Ambari Side or from HDFS side.

http://$NAMENODE_HOST:50070/dfshealth.html#tab-datanode

.

avatar
  • hi Jay , the hostnames are corrects on the two nodes , regarding to - /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid , is it necessary to remove these files inspite we reboot the two workers machines ?
Michael-Bronson

avatar
Master Mentor
@Michael Bronson

Sometimes Ambari Agent is not able to determine the correct PID of the running DataNode process. Agent reads the mentioned PID file and compares it with the running DataNode process to verify if it is running or not?

So restarting the ambari agent or removing the PID file and restarting the DataNode process can help in clearing any Stale PID file.

.

But in this case i am suspecting that it might be the issue between NameNode & DataNode communication.

avatar

hi JAY - I try this http://103.14.57.93:50070/dfshealth.html#tab-datanode ( is this syntax ok ? because we not get the page )

Michael-Bronson

avatar

how to check between NameNode & DataNode communication. ? ( meanwhile we have ssh between machines )

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

Please check which nameNode is Active NameNode. Try that IP Address in the following URL:

http://$ACTIVE_NAMENODE_IP:50070/dfshealth.html

.

Following should show the DataNodes that are communicating fine with NameNode

Or from ambari UI  --> HDFS --> "Quick Links"  --> ("Active NameNode Hostname")  --> NameNode UI<br>

avatar

hi Jay , I open the UI , but I see a lot of info , what need to check from the webpage? , any way I see that - Live Nodes - 3

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

In the NameNode UI youwill find the "Live Nodes" link, which should show all the DataNodes that are sending heartbeat to NameNode properly and communicating well.

avatar

yes I see that --->

Live Nodes3 (Decommissioned: 0)
Michael-Bronson