in our Amabri cluster we see that: all data nodes are up but in dashboard we see only 3 from 5 are up
so - how / why dashboard see only 3 from 5 ?
what need to check or sync here?
* just want to say that two hosts ( workers machine ) was added recently to the ambari cluster , any way we restart the ambari-agent and reboot these servers , but still the status on dasboard is 3/5
Ambari Server fetches some informations from the NameNode. Like the DataNode status. So as we see that the NameNode is saying only 3 DataNodes are Live means other 2 DataNodes are not able to communicate properly with the NameNode. Even though those DataNode might be running (means even if the DataNodes will be running and might be having a valid PID file) they are not communicating fine with the NameNode and hence ambari is just showing the information which it is getting from NameNode.
So at this point we can say that there is no issue from ambari side and it is showing the info about the Live DataNodes, what it is getting form the NameNode.
So in order to investigate why those DataNodes are not communicating fine with nameNode (Why Name Node is not showing all 5 nodes as Live) we will have to look at the NameNode log as well as the DataNode logs of the problematic DataNodes.
Regarding the Agent communication with Ambari Server:
Unable to reconnect to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com
Please check if those hosts are resolving the Ambari Server hostname & IP Address properly? Pleas check the "/etc/hosts" file entry of those hosts to verify if the ambari host is resolving fine.
Also please check if there is any port blockage OR Firewall issue in communicating to ambari server port 8441 frm those hosts?
# cat /etc/hosts # nc -v master02 8441 (OR) # telnet master02 8441
Please confirm that the "master02" is actually your Ambari Server host? If not then please check the "/etc/ambari-agent/conf/ambari-agent.ini" file to verify if the Ambari Hostname is correctly mentioned there?
On the other two hosts where we see that the DataNodes are not showing live please check if the Hostname is in Lowercase for those hosts. Sometimes mixed case Hostname / Uppercase hostname causes issues in determining the state of the HDFS component.
Also please remove the DataNode PID file from the "/var/run/hadoop/" directory and then try to restart the DataNode once again
Do you see any strange erorr/warning in the ambari-agent log or the Datanode of the problematic DataNode hosts?
Sometimes it can happen when few DataNodes are not able to communicate to the NameNode, So please check the NameNode UI to see what information do you see there. This is just to isolate the issue if it is from Ambari Side or from HDFS side.
Sometimes Ambari Agent is not able to determine the correct PID of the running DataNode process. Agent reads the mentioned PID file and compares it with the running DataNode process to verify if it is running or not?
So restarting the ambari agent or removing the PID file and restarting the DataNode process can help in clearing any Stale PID file.
But in this case i am suspecting that it might be the issue between NameNode & DataNode communication.
how to check between NameNode & DataNode communication. ? ( meanwhile we have ssh between machines )
Please check which nameNode is Active NameNode. Try that IP Address in the following URL:
Following should show the DataNodes that are communicating fine with NameNode
Or from ambari UI --> HDFS --> "Quick Links" --> ("Active NameNode Hostname") --> NameNode UI<br>