Created 08-13-2019 08:19 PM
HI,
I am new with YARN, I am not able to restart the nodemanagers on some of nodes. Could you please help me out?
Created 08-14-2019 08:45 AM
Can you share your HDP setup? HDP Version, number of nodes, HA or not, Kerberized or not and share the exact error being thrown preferably logs !
Typically a node manager should run on the same node as the data node.
HTH
Created 08-14-2019 01:57 PM
The HDP Version is 3.0.1.0 with 16 nodes 4 are worker nodes and the environment is kerberized. The node wn03 needed to be rebooted by the infra for maintenance purpose. I stopped all the services on that worker node and restarted all the services after rebooting the worker machine wn03.
This morning i get the following warning in ambari:
Connection failed to https://hd-dev-c1-wn03.hadoop.com:8044 (Execution of 'curl --location-trusted -k --negotiate -u : -b /var/lib/ambari-agent/tmp/cookies/8d14b5f3-5456-4599-9510-0036effff91d -c /var/lib/ambari-agent/tmp/cookies/8d14b5f3-5456-4599-9510-0036effff91d -w '%{http_code}' https://hd-dev-c1-wn03.hadoop.com:8044 --connect-timeout 5 --max-time 7 -o /dev/null 1>/tmp/tmpGaZLZ4 2>/tmp/tmpag2rxg' returned 7. % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to hd-dev-c1-wn03.hadoop.com:8044; Connection refused
000)
Created on 08-19-2019 12:26 AM - edited 08-19-2019 12:27 AM
As the error says
Connection failed to <a href="https://xxxxxxxxx-wn03.hadoop.com:8044" target="_blank">https://xxxxxxxxx-wn03.hadoop.com:8044</a>
Hence can you please check the host wn03 first to verify if the port 8044 is listening? Also please check if the firewall is disabled .. just to ensure that the mentioned port can be accessed remotely.
# netstat -tnlpa | grep 8044
If the port is not listening then it is obvious that the connection can b=not be established to it via curl (as ambari is attempting)
Int hat case please check the NodeManger log from that host "wn03" to find out if it is showing any error also please check and share the ResourceManager log
Created 09-05-2019 08:42 AM
Hi Koffi,
Could you please share the Resource manager log covering the timeframe when the issue happened
Thanks
AKR