Created 10-21-2024 02:45 AM
Hi Community,
I have recently upgraded from CM 7.6.7 to CM 7.11.3 and CDP 7.1.7 SP2 to CDP 7.1.7 SP3.
There are many services that are showing web server error on Cloudera as shown below. One of those services is HDFS.
When I checked the cloudera-scm-agent log, I found the following error.
[21/Oct/2024 09:09:09 +0100] 2414 GM IMPALAD throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:25000/jsonmetrics?json'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1244, in http_error_401
retry = self.http_error_auth_reqed('www-authenticate',
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1124, in http_error_auth_reqed
return self.retry_http_digest_auth(req, authreq)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1138, in retry_http_digest_auth
resp = self.parent.open(req, timeout=req.timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 388, in http_error_default
raise e
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 382, in http_error_default
return old(self, req, fp, code, msg, hdrs)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
[21/Oct/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[21/Oct/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
[21/Oct/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon WARNING Monitor slow to respond in readiness check: 45s GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07
[21/Oct/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon INFO Monitor expired: ('GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07',)
[21/Oct/2024 09:09:55 +0100] 2414 GM NODEMANAGER throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:61006/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[21/Oct/2024 09:09:55 +0100] 2414 GM DATANODE throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:9865/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
pecified GSS failure File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[21/Oct/2024 09:09:55 +0100] 2414 GM REGIONSERVER throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:61005/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
Please help me out here.
Created 10-23-2024 03:50 AM
Hello @sayebogbon
Thank you for reaching out
Can you please confirm if the nodes are heartbeating fine? You can check from Hosts >> All hosts page on CM UI and check the heartbeat
Regarding the kerberos errors
Can you please login to the node command line and try manual kinit
# cd /var/run/cloudera-scm-agent/process
Go into latest HDFS process directory and try manual kinit
Compare /etc/krb5.conf from working nodes as well
Also, has OS upgrade been performed as well?
Created on 11-10-2024 03:28 PM - edited 11-10-2024 03:56 PM
Hi @upadhyayk04
Thanks for your response. I have tried to reply your message but no luck. Hope this one will get through.
Firstly, the hosts are heart beating.
Secondly, the /etc/krb5.conf seems to be the same for other working host (Hue server host in this case). The Web Server Status issue is the same across HDFS, Hbase, Yarn, and Impala.
Thirdly, I had tried the manual kinit before but it still throw the same error.
After trying manual kinit (kinit -k -t hdfs.keytab hdfs/host.my-default-realm.com) from the latest data node process, I ran the klist command (klist -e) and got the following.
Valid starting Expires Service principal
10/11/24 23:43:47 11/11/24 09:43:47 krbtgt/my-default-realm.COM@my-default-realm.COM
renew until 17/11/24 23:43:47, Etype (skey, tkt): arcfour-hmac, aes256-cts-hmac-sha1-96
Below is the configured Kerberos Encryption Types from the Cloudera Manager Console
Below is part of the host /etc/krb5.conf content.
[libdefaults]
renew_lifetime = 604800
ticket_lifetime = 36000
udp_preference_limit = 1
permitted_enctypes = rc4-hmac aes256-cts aes128-cts
default_tgs_enctypes = rc4-hmac aes256-cts aes128-cts
default_tkt_enctypes = rc4-hmac aes256-cts aes128-cts
default_realm = my-default-realm.com
default_etypes = arcfour-hmac-md5
default_etypes_des = des-cbc-crc
allow_weak_crypto = true
forwardable = true
default_keytab_name = /etc/opt/quest/vas/host.keytab
[libvas]
site-name-override = iNET-LDAP
use-dns-srv = true
use-tcp-only = true
auth-helper-timeout = 60
Finally, the OS upgrade is not yet performed. We're still on RED Hat OL7.
Created 11-18-2024 04:11 AM
I would appreciate any support from anyone
Created 12-15-2024 10:21 PM
Hello @sayebogbon
Apologies for the delayed response
It seems the encryptions types are not matching in the keytab and KDC server
I would suggest the below
1.) Stop the cluster through CM
2.) Go to CM --> Administration --> Kerberos --> 'Kerberos Encryption Types', then add the following encryption types:
rc4-hmac aes256-cts aes128-cts
3.) Do redeploy krb5.conf through CM,
4.) Regenerate the keytabs and principles from CM UI
5.) Start the cluster
Let us know how it goes
Created 12-16-2024 02:19 PM
Upgrading Cloudera Manager or CDP can sometimes alter TLS/SSL settings. Please can you verify if TLS/SSL is enabled for the affected services:
Validate the keystore and truststore paths in the HDFS configuration:
Please do the above and revert
happy hadooping !!!!!