Created on 11-23-2017 03:28 PM - edited 08-17-2019 10:20 PM
in our Amabri cluster we see that: all data nodes are up but in dashboard we see only 3 from 5 are up
so - how / why dashboard see only 3 from 5 ?
what need to check or sync here?
* just want to say that two hosts ( workers machine ) was added recently to the ambari cluster , any way we restart the ambari-agent and reboot these servers , but still the status on dasboard is 3/5
Created 11-23-2017 04:30 PM
DataNode Issue:
Ambari Server fetches some informations from the NameNode. Like the DataNode status. So as we see that the NameNode is saying only 3 DataNodes are Live means other 2 DataNodes are not able to communicate properly with the NameNode. Even though those DataNode might be running (means even if the DataNodes will be running and might be having a valid PID file) they are not communicating fine with the NameNode and hence ambari is just showing the information which it is getting from NameNode.
.
So at this point we can say that there is no issue from ambari side and it is showing the info about the Live DataNodes, what it is getting form the NameNode.
.
So in order to investigate why those DataNodes are not communicating fine with nameNode (Why Name Node is not showing all 5 nodes as Live) we will have to look at the NameNode log as well as the DataNode logs of the problematic DataNodes.
.
Regarding the Agent communication with Ambari Server:
Unable to reconnect to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com
.
Please check if those hosts are resolving the Ambari Server hostname & IP Address properly? Pleas check the "/etc/hosts" file entry of those hosts to verify if the ambari host is resolving fine.
Also please check if there is any port blockage OR Firewall issue in communicating to ambari server port 8441 frm those hosts?
# cat /etc/hosts # nc -v master02 8441 (OR) # telnet master02 8441
.
Please confirm that the "master02" is actually your Ambari Server host? If not then please check the "/etc/ambari-agent/conf/ambari-agent.ini" file to verify if the Ambari Hostname is correctly mentioned there?
Created 11-23-2017 04:20 PM
from the agent log on one of the workers I see that - ERROR 2017-11-23 16:11:07,601 Controller.py:456 - Unable to reconnect to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com (attempts=5, details=Request to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com failed due to [Errno 111] Connection refused)
Created 11-23-2017 04:30 PM
DataNode Issue:
Ambari Server fetches some informations from the NameNode. Like the DataNode status. So as we see that the NameNode is saying only 3 DataNodes are Live means other 2 DataNodes are not able to communicate properly with the NameNode. Even though those DataNode might be running (means even if the DataNodes will be running and might be having a valid PID file) they are not communicating fine with the NameNode and hence ambari is just showing the information which it is getting from NameNode.
.
So at this point we can say that there is no issue from ambari side and it is showing the info about the Live DataNodes, what it is getting form the NameNode.
.
So in order to investigate why those DataNodes are not communicating fine with nameNode (Why Name Node is not showing all 5 nodes as Live) we will have to look at the NameNode log as well as the DataNode logs of the problematic DataNodes.
.
Regarding the Agent communication with Ambari Server:
Unable to reconnect to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com
.
Please check if those hosts are resolving the Ambari Server hostname & IP Address properly? Pleas check the "/etc/hosts" file entry of those hosts to verify if the ambari host is resolving fine.
Also please check if there is any port blockage OR Firewall issue in communicating to ambari server port 8441 frm those hosts?
# cat /etc/hosts # nc -v master02 8441 (OR) # telnet master02 8441
.
Please confirm that the "master02" is actually your Ambari Server host? If not then please check the "/etc/ambari-agent/conf/ambari-agent.ini" file to verify if the Ambari Hostname is correctly mentioned there?
Created 11-23-2017 04:36 PM
yes - master02 is actually your Ambari Server host
Created 11-23-2017 04:42 PM
I check VIA telnet and port 8441 is enabled and host resolving is ok
Created 11-23-2017 04:48 PM
hi Jay what we can so next ?
Created 11-23-2017 05:19 PM
We see two issues here:
Issue-1). DataNode Live status issue. Which is fro HDFS side because the NameNode is shoiwng only 3 Live Nodes instead of 5 out of 5.
>>> So in order to investigate that issue we will need the following:
a. NameNode logs (complete log)
b. DataNode logs from the problematic hosts (complete log)
.
Issue-2). We see that agent is showing the Connection refused for https://master02:8441, Which can be related to OpenSSL / Python issue as well because ambari agent communicates to ambari server using HTTPS 8441 & 8440 ports using Python & openssl libraries.
>>> So we will need to see the ambari-server.log as well as the complete ambari-agent.log to get more details about this issue. Also need to check if the OpenSSL/Python version and OS versions are same on all hosts.
.
Created 11-23-2017 05:26 PM
I check the logs on ambari-server , there are a lot of details but I see that --> Unable to propagate version for ServiceHostComponent on component: SPARK2_CLIENT, host: worker06.sys674.com. Error:
Created 11-23-2017 05:43 PM
how to verify the OpenSSL/Python version ?
Created 11-23-2017 05:44 PM
python -V
Python 2.7.5
this version is the same on the ambari-server and the workers machines
Created 11-23-2017 05:46 PM
openssl version
OpenSSL 1.0.1e-fips 11 Feb 2013
is the same on all hosts
 
					
				
				
			
		
