Support Questions

Find answers, ask questions, and share your expertise

Nodemanager bad health and connection refused

avatar
Contributor

@Jay Kumar SenSharma maybe you can help me with this one instead?

I have a 4-node cluster. All four are datanodes and one node is also the resource-manager. My ambari installation only installed a node-manager on my master resource-manager node. Assuming this is correct (please let me know if it is not), I have been getting errors about my node-manager. It says the health is bad because it cannot connect:

Connection failed to http://ncienspk01.nciwin.local:8042/ws/v1/node/info (Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_nodemanager_health.py", line 171, in execute
    url_response = urllib2.urlopen(query, timeout=connection_timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 1244, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.7/urllib2.py", line 1214, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 111] Connection refused>
)

Many of my services had corrupt installs and I did a re-install. That may be the case here as well. Thoughts on how to re-install?
Also- should I have a node-manager on every node? If so how do I install them and connect them.

Thanks for your help! Dan

2 ACCEPTED SOLUTIONS

avatar
Contributor

@Jay Kumar SenSharma
When I try to start services now I'm getting:

For HDFS Client Install

RuntimeError: Failed to execute command '/usr/bin/yum -y install hadoop_3_0_0_0_1634', exited with code '1', message: 'Error unpacking rpm package hadoop_3_0_0_0_1634-3.1.0.3.0.0.0-1634.x86_64'

For Hive Client Install

RuntimeError: Failed to execute command '/usr/bin/yum -y install hive_3_0_0_0_1634-hcatalog', exited with code '1', message: 'Error unpacking rpm package hadoop_3_0_0_0_1634-3.1.0.3.0.0.0-1634.x86_64

View solution in original post

avatar
Contributor

So I resolved all this. I just followed the steps here to remove all my packages, then deleted the contents of my files:

rm -rf /usr/hdp/
Then in Ambari I used the "Start all Services" command and it went through and installed everything again for me.

Then to solve the nodemanager issue I did the spark-yarn- install which jave me the missing jar that I needed and then just copied that dir:
/usr/hdp/3.0.0.0-.../spark2/aux/
to all the other nodes in my cluster. Now all my nodemanagers are coming up and things are looking good.
I'm creating another post about resolving my Timeline Service V2.0 issue which is somehow still persisting.

View solution in original post

13 REPLIES 13

avatar
Contributor

@Jay Kumar SenSharma
When I try to start services now I'm getting:

For HDFS Client Install

RuntimeError: Failed to execute command '/usr/bin/yum -y install hadoop_3_0_0_0_1634', exited with code '1', message: 'Error unpacking rpm package hadoop_3_0_0_0_1634-3.1.0.3.0.0.0-1634.x86_64'

For Hive Client Install

RuntimeError: Failed to execute command '/usr/bin/yum -y install hive_3_0_0_0_1634-hcatalog', exited with code '1', message: 'Error unpacking rpm package hadoop_3_0_0_0_1634-3.1.0.3.0.0.0-1634.x86_64

avatar
Contributor

@Jay Kumar SenSharma I'm definitely in a jam now. Really hoping you can help me. A bit scared to touch anything at this point.

avatar
Contributor

So I resolved all this. I just followed the steps here to remove all my packages, then deleted the contents of my files:

rm -rf /usr/hdp/
Then in Ambari I used the "Start all Services" command and it went through and installed everything again for me.

Then to solve the nodemanager issue I did the spark-yarn- install which jave me the missing jar that I needed and then just copied that dir:
/usr/hdp/3.0.0.0-.../spark2/aux/
to all the other nodes in my cluster. Now all my nodemanagers are coming up and things are looking good.
I'm creating another post about resolving my Timeline Service V2.0 issue which is somehow still persisting.

avatar
Expert Contributor

I'm glad that all sorted now another way was deleting the particular node from the cluster and then readding it and after adding spark client on it. I have recently done that one of my test cluster recently and it worked