Created on 11-18-2024 04:27 AM - edited 11-18-2024 09:18 AM
I have recently upgraded from CM 7.6.7 to CM 7.11.3 and CDP 7.1.7 SP2 to CDP 7.1.7 SP3.
HDFS Datanode, Impala Daemon, Yarn Resource Manager, and Hbase Region Server are showing unhealthy web server on Cloudera as shown below.
After checking one of the agents log, I found the following error.
18/Nov/2024 09:09:09 +0100] 2414 GM IMPALAD throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:25000/jsonmetrics?json'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1244, in http_error_401
retry = self.http_error_auth_reqed('www-authenticate',
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1124, in http_error_auth_reqed
return self.retry_http_digest_auth(req, authreq)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1138, in retry_http_digest_auth
resp = self.parent.open(req, timeout=req.timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 388, in http_error_default
raise e
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 382, in http_error_default
return old(self, req, fp, code, msg, hdrs)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
[18/Nov/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[18/Nov/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
[18/Nov/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon WARNING Monitor slow to respond in readiness check: 45s GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07
[18/Nov/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon INFO Monitor expired: ('GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07',)
[18/Nov/2024 09:09:55 +0100] 2414 GM NODEMANAGER throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:61006/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[18/Nov/2024 09:09:55 +0100] 2414 GM DATANODE throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:9865/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
pecified GSS failure File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[18/Nov/2024 09:09:55 +0100] 2414 GM REGIONSERVER throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:61005/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
I have tried everything I could but no luck.
Firstly, the hosts are heart beating.
Secondly, the /etc/krb5.conf seems to be the same for other working host (Hue server host in this case). The Web Server Status issue is the same across HDFS, Hbase, Yarn, and Impala.
Thirdly, I had tried the manual kinit before but it still throw the same error.
After trying manual kinit (kinit -k -t hdfs.keytab hdfs/host.my-default-realm.com) from the latest data node process, I ran the klist command (klist -e) and got the following.
[root@host 1546506889-hdfs-DATANODE]# klist -e
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: HTTP/host@EXAMPLE-REALM.com
Valid starting Expires Service principal
18/11/24 16:59:35 19/11/24 02:59:34 krbtgt/host@EXAMPLE-REALM.COM
renew until 25/11/24 16:59:34, Etype (skey, tkt): arcfour-hmac, aes256-cts-hmac-sha1-96
Below is the configured Kerberos Encryption Types from the Cloudera Manager Console
Below is part of the host /etc/krb5.conf content.
[libdefaults] renew_lifetime = 604800 ticket_lifetime = 36000 udp_preference_limit = 1 permitted_enctypes = rc4-hmac aes256-cts aes128-cts default_tgs_enctypes = rc4-hmac aes256-cts aes128-cts default_tkt_enctypes = rc4-hmac aes256-cts aes128-cts default_realm = my-default-realm.com default_etypes = arcfour-hmac-md5 default_etypes_des = des-cbc-crc allow_weak_crypto = true forwardable = true default_keytab_name = /etc/opt/quest/vas/host.keytab [libvas] site-name-override = iNET-LDAP use-dns-srv = true use-tcp-only = true auth-helper-timeout = 60
Finally, the OS upgrade is not yet performed. We're still on RED Hat OL7.
I know you're busy but any support will be much appreciated.
Thanks,
Stephen
Created 12-20-2024 01:40 AM
I think the problem partly has to do with our Python3.8 installation. We did the installation via Anaconda.
Cloudera recommended will use yum to install the rh-python38 on our RHEL/OL7 as I mentioned in the previous message. Documentation is here: Installing Python 3.8 standard package on RHEL 7 | CDP Private Cloud. The installation resolved most of the Web Server issue.
The Web Server issue for Impala not only has to do with Python installation but the Web Server username and password.
Below is the following action performed to resolve the Impala Web Server issue after enabling the hadoop_secure_web_ui.
WORK PERFORMED:
Also, regarding the Impala, this Cloudera documentation was quite helpful: Configuring Impala Web UI | CDP Public Cloud
The issue is resolved now by following the instructions in the above documentation.
Created 11-18-2024 08:41 AM
Hello @sayebogbon ,
Based on the error in the log you shared:
opened_url = urlopen_with_retry_on_authentication_errors
And the klist output showing this:
Valid starting Expires
10/11/24 23:43:47 11/11/24 09:43:47
Looks like you need to regenerate the kerberos credentials for this host.
To do so, please stop all services on this host.
Then go to CM > Administration > Security > Kerberos credentials.
In the search bar, type the hostname and select all the principals that appear, then click the regenerate selected button.
If there are no problems, new credentials should be generated.
Restart your services and let us know if that helps.
Created on 11-18-2024 09:41 AM - edited 11-18-2024 09:43 AM
Apologies, that is a wrong ticket. I should have changed it. I have updated it now.
Previously, I had regenerated both keytabs and kerberos credentials many times but no luck.
Also, after I manually kinit the kerberos ticket using kinit -k -t /var/run/cloudera-scm-agent/process/1546506889-hdfs-DATANODE/hdfs.keytab HTTP/host@EXAMPLE-REALM.COM,I was able to use curl command on the datanode web url (https://fqdn:9865) and got 200 ok response. However, it's seems like Cloudera isn't able to detect the credential for some reason.
See response below.
[root@host 1546506889-hdfs-DATANODE]# curl -v -k --negotiate -u : https://host.com:9865
* About to connect() to host.com port 9865 (#0)
* Trying xx.xx.xxx.xx...
* Connected to host.com (xx.xx.xxx.xx) port 9865 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=host.com,OU=Technology,O=xxxx plc,L=xl,ST=xl,C=GB
* start date: Nov 14 14:41:16 2024 GMT
* expire date: Nov 09 14:41:16 2025 GMT
* common name: host.com
* issuer: CN=host.com,OU=Technology,O=xxxx plc,L=xl,ST=xl,C=GB
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: host.com:9865
> Accept: */*
>
< HTTP/1.1 401 Authentication required
< Connection: close
< Pragma: no-cache
< Strict_Transport_Security: max-age=0; includeSubDomains
< X-Content-Type-Options: nosniff
< X-FRAME-OPTIONS: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Pragma: no-cache
< Strict_Transport_Security: max-age=0; includeSubDomains
< X-Content-Type-Options: nosniff
< X-FRAME-OPTIONS: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< WWW-Authenticate: Negotiate
< Set-Cookie: hadoop.auth=; Path=/; HttpOnly
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 447
<
* Closing connection 0
* Issue another request to this URL: 'https://host.com:9865/'
* About to connect() to host.com port 9865 (#1)
* Trying xx.xx.xxx.xx...
* Connected to host.com (xx.xx.xxx.xx) port 9865 (#1)
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=host.com,OU=Technology,O=xxxx plc,L=xl,ST=xl,C=GB
* start date: Nov 14 14:41:16 2024 GMT
* expire date: Nov 09 14:41:16 2025 GMT
* common name: host.com
* issuer: CN=host.com,OU=Technology,O=xxxx plc,L=xl,ST=xl,C=GB
* Server auth using GSS-Negotiate with user ''
> GET / HTTP/1.1
> Authorization: Negotiate xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxg==
> User-Agent: curl/7.29.0
> Host: host.com:9865
> Accept: */*
>
< HTTP/1.1 200 OK
< Connection: close
< Date: Mon, 18 Nov 2024 17:07:11 GMT
< Cache-Control: no-cache
< Expires: Mon, 18 Nov 2024 17:07:11 GMT
< Date: Mon, 18 Nov 2024 17:07:11 GMT
< Pragma: no-cache
< Content-Type: text/html
< Strict_Transport_Security: max-age=0; includeSubDomains
< X-Content-Type-Options: nosniff
< X-FRAME-OPTIONS: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Expires: Mon, 18 Nov 2024 17:07:11 GMT
< Date: Mon, 18 Nov 2024 17:07:11 GMT
< Pragma: no-cache
< Strict_Transport_Security: max-age=0; includeSubDomains
< X-Content-Type-Options: nosniff
< X-FRAME-OPTIONS: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< WWW-Authenticate: Negotiate xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
< Set-Cookie: hadoop.auth="u=HTTP&p=HTTP/host.com.COM&t=kerberos&e=17xxxxxx7&s=CaYM+xxxxxxxxxxfBXleJ0K/ObFbrjALqy/R//g="; Path=/; HttpOnly
< Last-Modified: Fri, 30 Aug 2024 16:14:30 GMT
< Accept-Ranges: bytes
< Content-Length: 1085
<
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="REFRESH" content="0;url=datanode.html" />
<title>Hadoop Administration</title>
</head>
* Closing connection 1
</html>[root@host 1546506889-hdfs-DATANODE]#
Created 12-08-2024 12:34 AM
We got in contact with Cloudera Support and they recommended installing standard Python38 for OL7. So, we followed this documentation:
The Web Server issue for HDFS, YARN, HBASE disappeared. However, the Web Server issue and http error for IMPALA persists.
[08/Dec/2024 07:29:15 +0000] 28735 ImpalaDaemonQueryMonitoring throttling_logger ERROR Error fetching metrics at 'https://host-exle.com:25000/jsonmetrics?json'
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 1244, in http_error_401
retry = self.http_error_auth_reqed('www-authenticate',
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 1124, in http_error_auth_reqed
return self.retry_http_digest_auth(req, authreq)
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 1138, in retry_http_digest_auth
resp = self.parent.open(req, timeout=req.timeout)
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 388, in http_error_default
raise e
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 382, in http_error_default
return old(self, req, fp, code, msg, hdrs)
File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
Created 12-11-2024 04:09 AM
Hi @sayebogbon
Could you restart the CM agent on the hosts where Impala daemon is in bad health and also restart service monitor from CM and check it out?
Regards,
Chethan YM
Created on 12-11-2024 04:54 AM - edited 12-11-2024 04:58 AM
Hi @ChethanYM ,
Thanks for your input. We have managed to resolve the web server issue by disabling the hadoop_secure_web_ui.
The only problem now is when we check the agent status by running systemctl status cloudera-scm-agent, it's reporting urllib.error.HTTPError: HTTP Error 401: Unauthorized as you can see below. The Cloudera support recommend I remove the /opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py from the rh-python38 so the agent will force its self to use it's own request.py from its python package. However, when I removed it , I was unable to start the agent again. I reported this to them and they had a session with me in which I uninstall and reinstall the agent but nothing works so far.
I had installed rh-python38 on our RHE/OL7 system by following this documentation: Installing Python 3.8 standard package on RHEL 7 | CDP Private Cloud. This is python version that the agent is running on
Note: the http error is not being reported in the /var/log/cloudera-scm-agen/cloudera-scm-agent.log. It's only reported when I check the status of the agent. Also, only a few hosts (datanode, yarn, hdfs, and some other host) have the issue.
[root@host-exle ~]# systemctl status cloudera-scm-agent
● cloudera-scm-agent.service - Cloudera Manager Agent Service
Loaded: loaded (/usr/lib/systemd/system/cloudera-scm-agent.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2024-12-09 19:30:21 GMT; 13h ago
Main PID: 18725 (cmagent)
CGroup: /system.slice/cloudera-scm-agent.service
└─18725 /usr/bin/python3.8 /opt/cloudera/cm-agent/bin/cm agent
Dec 09 19:30:33 host-exle cm[18725]: return self._call_chain(*args)
Dec 09 19:30:33 host-exle cm[18725]: File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 502, in _call_chain
Dec 09 19:30:33 host-exle cm[18725]: result = func(*args)
Dec 09 19:30:33 host-exle cm[18725]: File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 388, in http_error_default
Dec 09 19:30:33 host-exle cm[18725]: raise e
Dec 09 19:30:33 host-exle cm[18725]: File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 382, in http_error_default
Dec 09 19:30:33 host-exle cm[18725]: return old(self, req, fp, code, msg, hdrs)
Dec 09 19:30:33 host-exle cm[18725]: File "/opt/rh/rh-python38/root/usr/lib64/python3.8/urllib/request.py", line 649, in http_error_default
Dec 09 19:30:33 host-exle cm[18725]: raise HTTPError(req.full_url, code, msg, hdrs, fp)
Dec 09 19:30:33 host-exle cm[18725]: urllib.error.HTTPError: HTTP Error 401: Unauthorized
Created 12-20-2024 01:40 AM
I think the problem partly has to do with our Python3.8 installation. We did the installation via Anaconda.
Cloudera recommended will use yum to install the rh-python38 on our RHEL/OL7 as I mentioned in the previous message. Documentation is here: Installing Python 3.8 standard package on RHEL 7 | CDP Private Cloud. The installation resolved most of the Web Server issue.
The Web Server issue for Impala not only has to do with Python installation but the Web Server username and password.
Below is the following action performed to resolve the Impala Web Server issue after enabling the hadoop_secure_web_ui.
WORK PERFORMED:
Also, regarding the Impala, this Cloudera documentation was quite helpful: Configuring Impala Web UI | CDP Public Cloud
The issue is resolved now by following the instructions in the above documentation.