Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

all data nodes are up but in dashboard we see only 3 from 5 are up

avatar

42738-capture.png

42739-capture1.png

in our Amabri cluster we see that: all data nodes are up but in dashboard we see only 3 from 5 are up

so - how / why dashboard see only 3 from 5 ?

what need to check or sync here?

* just want to say that two hosts ( workers machine ) was added recently to the ambari cluster , any way we restart the ambari-agent and reboot these servers , but still the status on dasboard is 3/5

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

DataNode Issue:

Ambari Server fetches some informations from the NameNode. Like the DataNode status. So as we see that the NameNode is saying only 3 DataNodes are Live means other 2 DataNodes are not able to communicate properly with the NameNode. Even though those DataNode might be running (means even if the DataNodes will be running and might be having a valid PID file) they are not communicating fine with the NameNode and hence ambari is just showing the information which it is getting from NameNode.

.

So at this point we can say that there is no issue from ambari side and it is showing the info about the Live DataNodes, what it is getting form the NameNode.

.

So in order to investigate why those DataNodes are not communicating fine with nameNode (Why Name Node is not showing all 5 nodes as Live) we will have to look at the NameNode log as well as the DataNode logs of the problematic DataNodes.

.

Regarding the Agent communication with Ambari Server:

Unable to reconnect to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com

.

Please check if those hosts are resolving the Ambari Server hostname & IP Address properly? Pleas check the "/etc/hosts" file entry of those hosts to verify if the ambari host is resolving fine.

Also please check if there is any port blockage OR Firewall issue in communicating to ambari server port 8441 frm those hosts?

# cat /etc/hosts
# nc -v master02 8441
(OR)
# telnet master02 8441

.

Please confirm that the "master02" is actually your Ambari Server host? If not then please check the "/etc/ambari-agent/conf/ambari-agent.ini" file to verify if the Ambari Hostname is correctly mentioned there?

View solution in original post

21 REPLIES 21

avatar

from the agent log on one of the workers I see that - ERROR 2017-11-23 16:11:07,601 Controller.py:456 - Unable to reconnect to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com (attempts=5, details=Request to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com failed due to [Errno 111] Connection refused)

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

DataNode Issue:

Ambari Server fetches some informations from the NameNode. Like the DataNode status. So as we see that the NameNode is saying only 3 DataNodes are Live means other 2 DataNodes are not able to communicate properly with the NameNode. Even though those DataNode might be running (means even if the DataNodes will be running and might be having a valid PID file) they are not communicating fine with the NameNode and hence ambari is just showing the information which it is getting from NameNode.

.

So at this point we can say that there is no issue from ambari side and it is showing the info about the Live DataNodes, what it is getting form the NameNode.

.

So in order to investigate why those DataNodes are not communicating fine with nameNode (Why Name Node is not showing all 5 nodes as Live) we will have to look at the NameNode log as well as the DataNode logs of the problematic DataNodes.

.

Regarding the Agent communication with Ambari Server:

Unable to reconnect to https://master02:8441/agent/v1/heartbeat/worker06.sys674.com

.

Please check if those hosts are resolving the Ambari Server hostname & IP Address properly? Pleas check the "/etc/hosts" file entry of those hosts to verify if the ambari host is resolving fine.

Also please check if there is any port blockage OR Firewall issue in communicating to ambari server port 8441 frm those hosts?

# cat /etc/hosts
# nc -v master02 8441
(OR)
# telnet master02 8441

.

Please confirm that the "master02" is actually your Ambari Server host? If not then please check the "/etc/ambari-agent/conf/ambari-agent.ini" file to verify if the Ambari Hostname is correctly mentioned there?

avatar

yes - master02 is actually your Ambari Server host

Michael-Bronson

avatar

I check VIA telnet and port 8441 is enabled and host resolving is ok

Michael-Bronson

avatar

hi Jay what we can so next ?

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

We see two issues here:

Issue-1). DataNode Live status issue. Which is fro HDFS side because the NameNode is shoiwng only 3 Live Nodes instead of 5 out of 5.

>>> So in order to investigate that issue we will need the following:

a. NameNode logs (complete log)

b. DataNode logs from the problematic hosts (complete log)

.

Issue-2). We see that agent is showing the Connection refused for https://master02:8441, Which can be related to OpenSSL / Python issue as well because ambari agent communicates to ambari server using HTTPS 8441 & 8440 ports using Python & openssl libraries.

>>> So we will need to see the ambari-server.log as well as the complete ambari-agent.log to get more details about this issue. Also need to check if the OpenSSL/Python version and OS versions are same on all hosts.

.

avatar

I check the logs on ambari-server , there are a lot of details but I see that --> Unable to propagate version for ServiceHostComponent on component: SPARK2_CLIENT, host: worker06.sys674.com. Error:

Michael-Bronson

avatar

how to verify the OpenSSL/Python version ?

Michael-Bronson

avatar

python -V

Python 2.7.5

this version is the same on the ambari-server and the workers machines

Michael-Bronson

avatar

openssl version

OpenSSL 1.0.1e-fips 11 Feb 2013

is the same on all hosts

Michael-Bronson