Support Questions

Find answers, ask questions, and share your expertise

The Cloudera Manager Agent is not able to communicate with this role's web server

avatar
Contributor

Hi Community,

I have recently upgraded from CM 7.6.7 to CM 7.11.3 and CDP 7.1.7 SP2 to CDP 7.1.7 SP3.


There are many services that are showing web server error on Cloudera as shown below. One of those services is HDFS.

web-server-error.png

 

When I checked the cloudera-scm-agent log, I found the following error.

 

 

[21/Oct/2024 09:09:09 +0100] 2414 GM IMPALAD throttling_logger ERROR    Error fetching metrics at 'https://host.domain.com:25000/jsonmetrics?json'
Traceback (most recent call last):
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
    opened_url = urlopen_with_retry_on_authentication_errors(
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
    return function()
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
    return self._urlopen_callout(
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
    return opener.open(url, data, timeout)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1244, in http_error_401
    retry = self.http_error_auth_reqed('www-authenticate',
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1124, in http_error_auth_reqed
    return self.retry_http_digest_auth(req, authreq)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1138, in retry_http_digest_auth
    resp = self.parent.open(req, timeout=req.timeout)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 388, in http_error_default
    raise e
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 382, in http_error_default
    return old(self, req, fp, code, msg, hdrs)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
[21/Oct/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses    INFO     Creating a connection to the SERVICEMONITOR.
[21/Oct/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses    INFO     Creating a connection to the HOSTMONITOR.
[21/Oct/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon       WARNING  Monitor slow to respond in readiness check: 45s GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07
[21/Oct/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon       INFO     Monitor expired: ('GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07',)
[21/Oct/2024 09:09:55 +0100] 2414 GM NODEMANAGER throttling_logger ERROR    Error fetching metrics at 'https://host.domain.com:61006/jmx'
Traceback (most recent call last):
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
    neg_hdr = self.generate_request_header(req, headers, neg_value)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
    result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure.  Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
    opened_url = urlopen_with_retry_on_authentication_errors(
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
    return function()
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
    return self._urlopen_callout(
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
    return opener.open(url, data, timeout)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
    retry = self.http_error_auth_reqed(host, req, headers)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
    return self.retry_http_kerberos_auth(req, headers, neg_value)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
    log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[21/Oct/2024 09:09:55 +0100] 2414 GM DATANODE throttling_logger ERROR    Error fetching metrics at 'https://host.domain.com:9865/jmx'
Traceback (most recent call last):
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
    neg_hdr = self.generate_request_header(req, headers, neg_value)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
    result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure.  Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
    opened_url = urlopen_with_retry_on_authentication_errors(
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
    return function()
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
    return self._urlopen_callout(
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
    return opener.open(url, data, timeout)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
pecified GSS failure File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
    retry = self.http_error_auth_reqed(host, req, headers)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
    return self.retry_http_kerberos_auth(req, headers, neg_value)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
    log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[21/Oct/2024 09:09:55 +0100] 2414 GM REGIONSERVER throttling_logger ERROR    Error fetching metrics at 'https://host.domain.com:61005/jmx'
Traceback (most recent call last):
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
    neg_hdr = self.generate_request_header(req, headers, neg_value)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
    result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure.  Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
    opened_url = urlopen_with_retry_on_authentication_errors(
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
    return function()
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
    return self._urlopen_callout(
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
    return opener.open(url, data, timeout)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
    retry = self.http_error_auth_reqed(host, req, headers)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
    return self.retry_http_kerberos_auth(req, headers, neg_value)
  File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
    log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable

 

 

Please help me out here.

 

3 REPLIES 3

avatar
Master Collaborator

Hello @sayebogbon 

Thank you for reaching out

Can you please confirm if the nodes are heartbeating fine? You can check from Hosts >> All hosts page on CM UI and check the heartbeat

Regarding the kerberos errors

Can you please login to the node command line and try manual kinit

# cd /var/run/cloudera-scm-agent/process

Go into latest HDFS process directory and try manual kinit

Compare /etc/krb5.conf from working nodes as well

Also, has OS upgrade been performed as well?

 

 

 

avatar
Contributor

Hi @upadhyayk04 

Thanks for your response. I have tried to reply your message but no luck. Hope this one will get through.
Firstly, the hosts are heart beating.
Secondly, the /etc/krb5.conf seems to be the same for other working host (Hue server host in this case). The Web Server Status issue is the same across HDFS, Hbase, Yarn, and Impala.
Thirdly, I had tried the manual kinit before but it still throw the same error.
After trying manual kinit (kinit -k -t hdfs.keytab hdfs/host.my-default-realm.com) from the latest data node process, I ran the klist command (klist -e) and got the following.

Valid starting     Expires            Service principal
10/11/24 23:43:47  11/11/24 09:43:47  krbtgt/my-default-realm.COM@my-default-realm.COM
        renew until 17/11/24 23:43:47, Etype (skey, tkt): arcfour-hmac, aes256-cts-hmac-sha1-96



Below is the configured Kerberos Encryption Types from the Cloudera Manager Console
kerberos_encryption_type.png

Below is part of the host /etc/krb5.conf content.

[libdefaults]
 renew_lifetime = 604800
 ticket_lifetime = 36000
 udp_preference_limit = 1
 permitted_enctypes = rc4-hmac aes256-cts aes128-cts
 default_tgs_enctypes = rc4-hmac aes256-cts aes128-cts
 default_tkt_enctypes = rc4-hmac aes256-cts aes128-cts
 default_realm = my-default-realm.com
 default_etypes = arcfour-hmac-md5
 default_etypes_des = des-cbc-crc
 allow_weak_crypto = true

 forwardable = true
 default_keytab_name = /etc/opt/quest/vas/host.keytab
[libvas]
 site-name-override = iNET-LDAP
 use-dns-srv = true
 use-tcp-only = true

 auth-helper-timeout = 60


Finally, the OS upgrade is not yet performed. We're still on RED Hat OL7.

avatar
Contributor

I would appreciate any support from anyone