Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can't restart the ressourcemanager on node

Highlighted

Can't restart the ressourcemanager on node

Explorer

HI,

I am new with YARN, I am not able to restart the nodemanagers on some of nodes. Could you please help me out?

3 REPLIES 3

Re: Can't restart the ressourcemanager on node

Mentor

@Koffi

Can you share your HDP setup? HDP Version, number of nodes, HA or not, Kerberized or not and share the exact error being thrown preferably logs !

Typically a node manager should run on the same node as the data node.

HTH

Re: Can't restart the ressourcemanager on node

Explorer

The HDP Version is 3.0.1.0 with 16 nodes 4 are worker nodes and the environment is kerberized. The node wn03 needed to be rebooted by the infra for maintenance purpose. I stopped all the services on that worker node and restarted all the services after rebooting the worker machine wn03.

This morning i get the following warning in ambari:

Connection failed to https://hd-dev-c1-wn03.hadoop.com:8044 (Execution of 'curl --location-trusted -k --negotiate -u : -b /var/lib/ambari-agent/tmp/cookies/8d14b5f3-5456-4599-9510-0036effff91d -c /var/lib/ambari-agent/tmp/cookies/8d14b5f3-5456-4599-9510-0036effff91d -w '%{http_code}' https://hd-dev-c1-wn03.hadoop.com:8044 --connect-timeout 5 --max-time 7 -o /dev/null 1>/tmp/tmpGaZLZ4 2>/tmp/tmpag2rxg' returned 7. % Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed


0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to hd-dev-c1-wn03.hadoop.com:8044; Connection refused

000)

Re: Can't restart the ressourcemanager on node

Super Mentor

@Koffi 

As the error says 

 

Connection failed to <a href="https://xxxxxxxxx-wn03.hadoop.com:8044" target="_blank">https://xxxxxxxxx-wn03.hadoop.com:8044</a>

 

 

Hence can you please check the host wn03 first to verify if the port 8044 is listening?  Also please check if the firewall is disabled .. just to ensure that the mentioned port can be accessed remotely.

 

# netstat -tnlpa | grep 8044

 

If the port is not listening then it is obvious that the connection can b=not be established to it via curl (as ambari is attempting)
Int hat case please check the NodeManger log from that host "wn03" to find out if it is showing any error also please check and share the ResourceManager log