Reply
New Contributor
Posts: 3
Registered: ‎01-25-2016
Accepted Solution

Web Server Status Bad Health

Hi all,

 

I have successfully installed Cloudera Manager 5.5.1 on a private cluster with only HDFS, YARN and Spark.

 

I keep getting Health Issues every 10 - 15 minutes reporting "Web Server Status : The Cloudera Manager Agent got an unexpected response from this role's web server."

 

the corresponding entry in the host 's cloudera agent is the following

 

[29/Jan/2016 16:51:32 +0000] 1237 Monitor-HostMonitor throttling_logger ERROR    (30 skipped) Failed to collect NTP metrics
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/ntp_monitor.py", line 37, in collect
    result, stdout, stderr = self._subprocess_with_timeout(args, self._timeout)
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/ntp_monitor.py", line 30, in _subprocess_with_timeout
    return subprocess_with_timeout(args, timeout)
  File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 49, in subprocess_with_timeout
    p = subprocess.Popen(**kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

And another one

 

[29/Jan/2016 16:48:32 +0000] 1237 Monitor-GenericMonitor throttling_logger ERROR    (1 skipped) Error fetching metrics at 'http://host-hd-01.corp.nodalpoint.com:8086/jmx'
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/generic/metric_collectors.py", line 165, in collect_and_parse
    simplejson.load(opened_url))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/simplejson-2.1.2-py2.7-linux-x86_64.egg/simplejson/__init__.py", line 324, in load
    return loads(fp.read(),
  File "/usr/lib64/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib64/python2.7/httplib.py", line 567, in read
    s = self.fp.read(amt)
  File "/usr/lib64/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
error: [Errno 9] Bad file descriptor

Has anyone else noticed similar issues?

 

Thank you

Cloudera Employee
Posts: 276
Registered: ‎07-08-2013

Re: Web Server Status Bad Health

The first issue has to do with NTP [1] and the second, is when the Agent attempted to read the json contents (possibly Service Monitor metrics), it encoutered an error hece the

  

Error fetching metrics at 'http://host-hd-01.corp.nodalpoint.com:8086/jmx'

 

Do you know which for which role the health check is reporting - for a reference, can you attach a screenshot?

 

[1] http://www.cloudera.com/documentation/enterprise/latest/topics/install_cdh_enable_ntp.html

New Contributor
Posts: 3
Registered: ‎01-25-2016

Re: Web Server Status Bad Health

Thank you Michalis for your quick response.

 

Regarding the first issue you mention that it is related to NTP.

I use RHEL 7.1 for operating system which uses the chrony service by default instead of NTP.

Do you recommend to replace the chrony service with the ntp service?

 

Regarding the second issue i am providing screenshots from three different services where this issue occurs

 

a) from the Host Monitor

[01/Feb/2016 10:44:34 +0000] 1237 Monitor-GenericMonitor throttling_logger ERROR    (8 skipped) Error fetching metrics at 'http://host-hd-01.corp.nodalpoint.com:8086/jmx'
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/generic/metric_collectors.py", line 165, in collect_and_parse
    simplejson.load(opened_url))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/simplejson-2.1.2-py2.7-linux-x86_64.egg/simplejson/__init__.py", line 324, in load
    return loads(fp.read(),
  File "/usr/lib64/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib64/python2.7/httplib.py", line 567, in read
    s = self.fp.read(amt)
  File "/usr/lib64/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
error: [Errno 9] Bad file descriptor

with its corresponding screenshot

 

Host Monitor.png

 

b) from a Yarn Node Manager

[01/Feb/2016 10:37:19 +0000] 1363 Monitor-GenericMonitor throttling_logger ERROR    (6 skipped) Error fetching metrics at 'http://host-hd-03.corp.nodalpoint.com:8042/jmx'
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/generic/metric_collectors.py", line 165, in collect_and_parse
    simplejson.load(opened_url))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/simplejson-2.1.2-py2.7-linux-x86_64.egg/simplejson/__init__.py", line 324, in load
    return loads(fp.read(),
  File "/usr/lib64/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib64/python2.7/httplib.py", line 567, in read
    s = self.fp.read(amt)
  File "/usr/lib64/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
error: [Errno 9] Bad file descriptor

with its corresponding screenshot

 

Node Manager.png

 

c) and from the Name Node

[01/Feb/2016 10:53:34 +0000] 1237 Monitor-GenericMonitor throttling_logger ERROR    (1 skipped) Error fetching metrics at 'http://host-hd-01.corp.nodalpoint.com:8087/jmx'
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/generic/metric_collectors.py", line 165, in collect_and_parse
    simplejson.load(opened_url))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/simplejson-2.1.2-py2.7-linux-x86_64.egg/simplejson/__init__.py", line 324, in load
    return loads(fp.read(),
  File "/usr/lib64/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib64/python2.7/httplib.py", line 567, in read
    s = self.fp.read(amt)
  File "/usr/lib64/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
error: [Errno 9] Bad file descriptor

with its corresponding screenshot

 

Name Node.png

 

 

Please tell me if you require any more information

 

Thanks again for your support

 

Filaretos

 

 

 

 

Highlighted
New Contributor
Posts: 3
Registered: ‎01-25-2016

Re: Web Server Status Bad Health

The issue is resolved.

 

I replaced the chrony service with the NTP service, according to Michali's recommendation, on all my hosts and all errors stopped.

Not only the errors which where explicitely stating "Failed to collect NTP metrics" but also all other errors. Apparently all these errors where somehow related to the inability to collect NTP metrics.

 

Thank you!

 

 

Announcements