Member since
10-05-2024
18
Posts
9
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
610 | 10-21-2024 01:45 AM |
11-18-2024
09:41 AM
Apologies, that is a wrong ticket. I should have changed it. I have updated it now. Previously, I had regenerated both keytabs and kerberos credentials many times but no luck. Also, after I manually kinit the kerberos ticket using kinit -k -t /var/run/cloudera-scm-agent/process/1546506889-hdfs-DATANODE/hdfs.keytab HTTP/host@EXAMPLE-REALM.COM,I was able to use curl command on the datanode web url (https://fqdn:9865) and got 200 ok response. However, it's seems like Cloudera isn't able to detect the credential for some reason. See response below. [root@host 1546506889-hdfs-DATANODE]# curl -v -k --negotiate -u : https://host.com:9865
* About to connect() to host.com port 9865 (#0)
* Trying xx.xx.xxx.xx...
* Connected to host.com (xx.xx.xxx.xx) port 9865 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=host.com,OU=Technology,O=xxxx plc,L=xl,ST=xl,C=GB
* start date: Nov 14 14:41:16 2024 GMT
* expire date: Nov 09 14:41:16 2025 GMT
* common name: host.com
* issuer: CN=host.com,OU=Technology,O=xxxx plc,L=xl,ST=xl,C=GB
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: host.com:9865
> Accept: */*
>
< HTTP/1.1 401 Authentication required
< Connection: close
< Pragma: no-cache
< Strict_Transport_Security: max-age=0; includeSubDomains
< X-Content-Type-Options: nosniff
< X-FRAME-OPTIONS: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Pragma: no-cache
< Strict_Transport_Security: max-age=0; includeSubDomains
< X-Content-Type-Options: nosniff
< X-FRAME-OPTIONS: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< WWW-Authenticate: Negotiate
< Set-Cookie: hadoop.auth=; Path=/; HttpOnly
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 447
<
* Closing connection 0
* Issue another request to this URL: 'https://host.com:9865/'
* About to connect() to host.com port 9865 (#1)
* Trying xx.xx.xxx.xx...
* Connected to host.com (xx.xx.xxx.xx) port 9865 (#1)
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=host.com,OU=Technology,O=xxxx plc,L=xl,ST=xl,C=GB
* start date: Nov 14 14:41:16 2024 GMT
* expire date: Nov 09 14:41:16 2025 GMT
* common name: host.com
* issuer: CN=host.com,OU=Technology,O=xxxx plc,L=xl,ST=xl,C=GB
* Server auth using GSS-Negotiate with user ''
> GET / HTTP/1.1
> Authorization: Negotiate xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxg==
> User-Agent: curl/7.29.0
> Host: host.com:9865
> Accept: */*
>
< HTTP/1.1 200 OK
< Connection: close
< Date: Mon, 18 Nov 2024 17:07:11 GMT
< Cache-Control: no-cache
< Expires: Mon, 18 Nov 2024 17:07:11 GMT
< Date: Mon, 18 Nov 2024 17:07:11 GMT
< Pragma: no-cache
< Content-Type: text/html
< Strict_Transport_Security: max-age=0; includeSubDomains
< X-Content-Type-Options: nosniff
< X-FRAME-OPTIONS: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Expires: Mon, 18 Nov 2024 17:07:11 GMT
< Date: Mon, 18 Nov 2024 17:07:11 GMT
< Pragma: no-cache
< Strict_Transport_Security: max-age=0; includeSubDomains
< X-Content-Type-Options: nosniff
< X-FRAME-OPTIONS: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< WWW-Authenticate: Negotiate xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
< Set-Cookie: hadoop.auth="u=HTTP&p=HTTP/host.com.COM&t=kerberos&e=17xxxxxx7&s=CaYM+xxxxxxxxxxfBXleJ0K/ObFbrjALqy/R//g="; Path=/; HttpOnly
< Last-Modified: Fri, 30 Aug 2024 16:14:30 GMT
< Accept-Ranges: bytes
< Content-Length: 1085
<
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="REFRESH" content="0;url=datanode.html" />
<title>Hadoop Administration</title>
</head>
* Closing connection 1
</html>[root@host 1546506889-hdfs-DATANODE]#
... View more
11-18-2024
04:27 AM
1 Kudo
I have recently upgraded from CM 7.6.7 to CM 7.11.3 and CDP 7.1.7 SP2 to CDP 7.1.7 SP3. HDFS Datanode, Impala Daemon, Yarn Resource Manager, and Hbase Region Server are showing unhealthy web server on Cloudera as shown below. After checking one of the agents log, I found the following error. 18/Nov/2024 09:09:09 +0100] 2414 GM IMPALAD throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:25000/jsonmetrics?json'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1244, in http_error_401
retry = self.http_error_auth_reqed('www-authenticate',
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1124, in http_error_auth_reqed
return self.retry_http_digest_auth(req, authreq)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1138, in retry_http_digest_auth
resp = self.parent.open(req, timeout=req.timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 388, in http_error_default
raise e
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 382, in http_error_default
return old(self, req, fp, code, msg, hdrs)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
[18/Nov/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[18/Nov/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
[18/Nov/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon WARNING Monitor slow to respond in readiness check: 45s GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07
[18/Nov/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon INFO Monitor expired: ('GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07',)
[18/Nov/2024 09:09:55 +0100] 2414 GM NODEMANAGER throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:61006/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[18/Nov/2024 09:09:55 +0100] 2414 GM DATANODE throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:9865/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
pecified GSS failure File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[18/Nov/2024 09:09:55 +0100] 2414 GM REGIONSERVER throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:61005/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable I have tried everything I could but no luck. Firstly, the hosts are heart beating. Secondly, the /etc/krb5.conf seems to be the same for other working host (Hue server host in this case). The Web Server Status issue is the same across HDFS, Hbase, Yarn, and Impala. Thirdly, I had tried the manual kinit before but it still throw the same error. After trying manual kinit (kinit -k -t hdfs.keytab hdfs/host.my-default-realm.com) from the latest data node process, I ran the klist command (klist -e) and got the following. [root@host 1546506889-hdfs-DATANODE]# klist -e
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: HTTP/host@EXAMPLE-REALM.com
Valid starting Expires Service principal
18/11/24 16:59:35 19/11/24 02:59:34 krbtgt/host@EXAMPLE-REALM.COM
renew until 25/11/24 16:59:34, Etype (skey, tkt): arcfour-hmac, aes256-cts-hmac-sha1-96 Below is the configured Kerberos Encryption Types from the Cloudera Manager Console Below is part of the host /etc/krb5.conf content. [libdefaults]
renew_lifetime = 604800
ticket_lifetime = 36000
udp_preference_limit = 1
permitted_enctypes = rc4-hmac aes256-cts aes128-cts
default_tgs_enctypes = rc4-hmac aes256-cts aes128-cts
default_tkt_enctypes = rc4-hmac aes256-cts aes128-cts
default_realm = my-default-realm.com
default_etypes = arcfour-hmac-md5
default_etypes_des = des-cbc-crc
allow_weak_crypto = true
forwardable = true
default_keytab_name = /etc/opt/quest/vas/host.keytab
[libvas]
site-name-override = iNET-LDAP
use-dns-srv = true
use-tcp-only = true
auth-helper-timeout = 60 Finally, the OS upgrade is not yet performed. We're still on RED Hat OL7. I know you're busy but any support will be much appreciated. Thanks, Stephen
... View more
Labels:
11-18-2024
04:11 AM
1 Kudo
I would appreciate any support from anyone
... View more
11-14-2024
10:43 AM
Thanks for getting back. The process_timestamp isn't there. It's not available on other running processes too. I had tried the work around, it didn't work, but I will give it another go. Another thing is the soft link for RegionServer process does not exist in /var/run/cloudera-scm-agent/supervisor/include directory.
... View more
11-11-2024
10:18 PM
1 Kudo
The issue was sorted after I reboot the host. I believe the reboot did the same things you mentioned above. I can start, datanode, impala daemon, and yarn. However, I am still unable to start hbase regionserver. I'm getting the following error. I believe it's something related to znode file not existing in the process directory. + echo 'Adding HBoss JARs to HBase service classpath'
+ znode_cleanup regionserver
+ export 'HBASE_CLASSPATH=/opt/cloudera/cm/lib/plugins/event-publish-7.11.3-shaded.jar:/opt/cloudera/cm/lib/plugins/tt-instrumentation-7.11.3.jar:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase_filesystem/lib/*'
+ HBASE_CLASSPATH='/opt/cloudera/cm/lib/plugins/event-publish-7.11.3-shaded.jar:/opt/cloudera/cm/lib/plugins/tt-instrumentation-7.11.3.jar:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase_filesystem/lib/*'
+ exec /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase/../../bin/hbase --config /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER regionserver start
++ date
+ echo 'Tue 12 Nov 06:03:50 GMT 2024 Starting znode cleanup thread with HBASE_ZNODE_FILE=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/znode14618 for regionserver'
++ replace_pid -Djava.net.preferIPv4Stack=true
++ echo -Djava.net.preferIPv4Stack=true
++ sed 's#{{PID}}#14618#g'
+ HBASE_OPTS=-Djava.net.preferIPv4Stack=true
+ '[' jaas.conf '!=' '' ']'
+ export 'HBASE_OPTS=-Djava.security.auth.login.config=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/jaas.conf -Djava.net.preferIPv4Stack=true'
+ HBASE_OPTS='-Djava.security.auth.login.config=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/jaas.conf -Djava.net.preferIPv4Stack=true'
+ LOG_FILE=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs/znode_cleanup.log
+ set +x
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
/opt/cloudera/cm-agent/service/hbase/hbase.sh: line 234: kill: (14618) - No such process
+ RET=0
+ '[' -f /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/znode14618 ']'
++ date
+ echo 'Tue 12 Nov 06:03:56 GMT 2024 Znode file does not exist. No cleanup required.'
+ exit 0 Below is the agent log. [12/Nov/2024 05:53:11 +0000] 1559 MainThread heartbeat_tracker INFO HB stats (seconds): num:43 LIFE_MIN:0.08 min:0.04 mean:0.06 max:0.11 LIFE_MAX:0.20
[12/Nov/2024 06:03:12 +0000] 1559 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.04 min:0.04 mean:0.07 max:0.11 LIFE_MAX:0.20
[12/Nov/2024 06:03:16 +0000] 1559 CP Server WorkerThread _cplogging INFO 127.0.0.1 - - [12/Nov/2024:06:03:16] "GET /heartbeat HTTP/1.1" 200 2 "" "python-requests/2.26.0"
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503323-hbase-REGIONSERVER] Updating process (remove).
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503323-hbase-REGIONSERVER] Deactivating process (skipped)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503323-hbase-REGIONSERVER] stopping monitors
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503323-hbase-REGIONSERVER] Orphaning process
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process ERROR Error creating marker /var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 1302, in mark_orphan
f = open(marker, 'w')
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp'
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Using specific audit plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Creating metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Using specific metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Using generic metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Creating profile plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Using generic profile plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Instantiating process
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Updating process: True {}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO First time to activate the process [1546503485-hbase-REGIONSERVER].
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Creating cgroup /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Creating cgroup /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Creating cgroup /sys/fs/cgroup/devices/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Created /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Chowning /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER to hbase (39993) hbase (39993)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER to 0751
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Created /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Chowning /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs to hbase (39993) hbase (39993)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs to 0751
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Refreshing process files: None
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO /opt/cloudera/cmlib/postgresql-connector.jar doesn't exists! Trying to find /usr/share/java/postgresql-connector-java.jar
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO /usr/share/java/postgresql-connector-java.jar doesn't exists! Trying to find a postgres jar of the pattern /opt/cloudera/cmlib/postgres*.jar
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO prepare_environment begin: {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125', 'SPARK3': '3.2.3.3.2.7172000.0-334-1.p0.37609510'}, ['cdh'], ['hdfs-client-plugin', 'cdh-plugin', 'hbase-plugin']
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO The following requested parcels are not available: {}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO Obtained tags ['cdh', 'impala', 'sentry', 'solr', 'spark', 'kafka', 'kudu'] for parcel CDH
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO Obtained tags ['spark3'] for parcel SPARK3
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel_patch INFO Patched parcel in /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125 for python3 compatibility.
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO prepare_environment end: {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125'}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue __init__ INFO Extracted 19 files and 0 dirs to /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER.
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue throttling_logger INFO Added principal HTTP/host.com with keytab /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/hbase.keytab as a candidate to kinit
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: cpu
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Reconfiguring cgroup pseudofile /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER/cpu.shares with value 400
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Reconfiguring cgroup pseudofile /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER/cpu.rt_runtime_us with value 1000
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: io
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Reconfiguring cgroup pseudofile /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER/blkio.weight with value 200
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: memory
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: directory
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: tcp_listen
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO reading limits: {'limit_fds': 32768, 'limit_memlock': None}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Launching process. one-off False, command hbase/hbase.sh, args ['regionserver', 'start']
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue supervisor WARNING Failed while getting process info. Retrying. (<Fault 10: 'BAD_NAME: 1546503485-hbase-REGIONSERVER'>)
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue supervisor INFO Triggering supervisord update.
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Begin audit plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Begin metadata plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Begin profile plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue daemon INFO Instantiating generic monitor for service HBASE and role REGIONSERVER
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Begin monitor refresh.
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue abstract_monitor INFO Refreshing GenericMonitor HBASE-REGIONSERVER for None
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue daemon INFO New monitor: (<cmf.monitor.generic.GenericMonitor object at 0x7f727379a2b0>,)
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Daemon refresh complete for process 1546503485-hbase-REGIONSERVER.
[12/Nov/2024 06:03:20 +0000] 1559 Profile-Plugin navigator_plugin INFO Pipelines updated for Profile Plugin: set()
[12/Nov/2024 06:03:20 +0000] 1559 Audit-Plugin navigator_plugin INFO Pipelines updated for Audit Plugin: []
[12/Nov/2024 06:03:20 +0000] 1559 Metadata-Plugin navigator_plugin INFO Pipelines updated for Metadata Plugin: []
[12/Nov/2024 06:03:57 +0000] 1559 MainThread process INFO [1546503485-hbase-REGIONSERVER] Unregistered supervisor process FATAL
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups INFO Destroying cgroup /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups INFO Destroying cgroup /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups INFO Destroying cgroup /sys/fs/cgroup/devices/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:59 +0000] 1559 MainThread supervisor INFO Triggering supervisord update.
[12/Nov/2024 06:03:59 +0000] 1559 MainThread throttling_logger INFO Removed keytab /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/hbase.keytab as a candidate to kinit from
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Updating process: False {'run_generation': (1, 2), 'running': (True, False)}
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Deactivating process (skipped)
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] stopping monitors
[12/Nov/2024 06:04:15 +0000] 1559 Profile-Plugin navigator_plugin INFO stopping Profile Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:15 +0000] 1559 Audit-Plugin navigator_plugin INFO stopping Audit Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:15 +0000] 1559 Metadata-Plugin navigator_plugin INFO stopping Metadata Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:18 +0000] 1559 MonitorDaemon-Scheduler daemon INFO Monitor expired: ('GenericMonitor HBASE-REGIONSERVER for hbase-REGIONSERVER-78fd4f39bfc69a473cc5abed13e41dac',)
... View more
11-10-2024
03:28 PM
Hi @upadhyayk04 Thanks for your response. I have tried to reply your message but no luck. Hope this one will get through. Firstly, the hosts are heart beating. Secondly, the /etc/krb5.conf seems to be the same for other working host (Hue server host in this case). The Web Server Status issue is the same across HDFS, Hbase, Yarn, and Impala. Thirdly, I had tried the manual kinit before but it still throw the same error. After trying manual kinit (kinit -k -t hdfs.keytab hdfs/host.my-default-realm.com) from the latest data node process, I ran the klist command (klist -e) and got the following. Valid starting Expires Service principal
10/11/24 23:43:47 11/11/24 09:43:47 krbtgt/my-default-realm.COM@my-default-realm.COM
renew until 17/11/24 23:43:47, Etype (skey, tkt): arcfour-hmac, aes256-cts-hmac-sha1-96 Below is the configured Kerberos Encryption Types from the Cloudera Manager Console Below is part of the host /etc/krb5.conf content. [libdefaults]
renew_lifetime = 604800
ticket_lifetime = 36000
udp_preference_limit = 1
permitted_enctypes = rc4-hmac aes256-cts aes128-cts
default_tgs_enctypes = rc4-hmac aes256-cts aes128-cts
default_tkt_enctypes = rc4-hmac aes256-cts aes128-cts
default_realm = my-default-realm.com
default_etypes = arcfour-hmac-md5
default_etypes_des = des-cbc-crc
allow_weak_crypto = true
forwardable = true
default_keytab_name = /etc/opt/quest/vas/host.keytab
[libvas]
site-name-override = iNET-LDAP
use-dns-srv = true
use-tcp-only = true
auth-helper-timeout = 60 Finally, the OS upgrade is not yet performed. We're still on RED Hat OL7.
... View more
11-09-2024
07:58 AM
1 Kudo
Hi Community, I am unable to stop or start some Cloudera services. Cloudera version: 7.11.3 CDP Version: 7.1.7 SP3 Below is the type of error I get while trying to stop (e.g) impala daemon from the Cloudera Manager console. On the UI, it shows the status of the service is unknown as shown below. As you can see, the host 207 has a question mark in its status column, which signifies unknown status. Also, for other host (206, 208), the hdfs datanode has the same issue as that of the impala daemon. Apart from the impala daemon and hdfs datanode instances, I get the same issue on hue load balancer instance. Everything was fine a few days ago until yesterday. I have tried restarting the cloudera-scm-supervisord and the cloudera-scm-agent but no luck. Below is the cloudera-scm-agent.log error I get for all the hosts on which those services are running. It's like nothing else in the cloudera-scm-agent.log apart from the following error. [09/Nov/2024 15:18:43 +0000] 14061 __run_queue process ERROR Failed to update {'id': 1546497355, 'name': 'impala-IMPALAD', 'program': 'impala/impala.sh', 'arguments': ['impalad', 'impalad_flags', 'false'], 'status_links': {'status': 'https://host-207.com:25000/'}, 'running': True, 'run_generation': 15, 'one_off': False, 'auto_restart': True, 'user': 'impala', 'group': 'impala', 'extra_groups': [], 'environment': {'GLOG_log_dir': '/data/log/impalad', 'HADOOP_CREDSTORE_PASSWORD': 'somePassword', 'JAVA_TOOL_OPTIONS': '-Xms8589934592 -Xmx8589934592 -Djavax.net.ssl.trustStore=/etc/pki/tls/private/trust.jks -Djavax.net.ssl.trustStorePassword=somePassword -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/impala_impala-IMPALAD-c4fcff50b410d1eeac2d6da18c375a7d_pid{{PID}}.hprof -XX:OnOutOfMemoryError={{AGENT_COMMON_DIR}}/killparent.sh', 'GLOG_logbuflevel': '0', 'JAVA_HOME': '/usr/java/default', 'GLOG_v': '1', 'GLOG_minloglevel': '0', 'CDH_VERSION': '7', 'USER': 'impala', 'GLOG_max_log_size': '20'}, 'resources': [{'dynamic': True, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': {'shares': 200}, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': True, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': {'weight': 100}, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': {'soft_limit': -1, 'hard_limit': -1}, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': {'limit_fds': None, 'limit_memlock': None}, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad/audit', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/data/impala/impalad', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 25000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 22000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/data/log/impalad', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/solr/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 21000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 21050}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 28000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 27000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad/lineage', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 23000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/ranger/impala/policy-cache', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/hdfs/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala-minidumps', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 0}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/atlas-spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/impala/udfs', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': True, 'directory': {'path': '/data/log/impalad/jstacks', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/ranger/impala/policy-cache', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/hdfs/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/solr/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}], 'refresh_files': ['cloudera-stack-monitor.properties', 'cloudera-monitor.properties', 'cloudera-monitor.properties', 'navigator.client.properties', 'navigator.lineage.client.properties', 'impala-conf/fair-scheduler.xml', 'impala-conf/llama-site.xml', 'telepub.client.properties'], 'config_generation': 0, 'special_file_info': [], 'parcels': {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125', 'SPARK3': '3.2.3.3.2.7172000.0-334-1.p0.37609510'}, 'required_tags': ['cdh', 'impala'], 'optional_tags': ['hdfs-client-plugin', 'impala-plugin'], 'start_timeout_seconds': 20, 'expected_exitcodes': [], 'start_retries': 3}
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 449, in handle_heartbeat
process = cls(agent.cfg, agent, raw)
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 187, in __init__
self.process_info = json.load(f)
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/__init__.py", line 467, in load
return loads(fp.read(),
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/__init__.py", line 525, in loads
return _default_decoder.decode(s)
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0) I'm not sure where else to look at this point.
... View more
Labels:
10-21-2024
02:45 AM
1 Kudo
Hi Community, I have recently upgraded from CM 7.6.7 to CM 7.11.3 and CDP 7.1.7 SP2 to CDP 7.1.7 SP3. There are many services that are showing web server error on Cloudera as shown below. One of those services is HDFS. When I checked the cloudera-scm-agent log, I found the following error. [21/Oct/2024 09:09:09 +0100] 2414 GM IMPALAD throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:25000/jsonmetrics?json'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1244, in http_error_401
retry = self.http_error_auth_reqed('www-authenticate',
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1124, in http_error_auth_reqed
return self.retry_http_digest_auth(req, authreq)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 1138, in retry_http_digest_auth
resp = self.parent.open(req, timeout=req.timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 388, in http_error_default
raise e
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/https.py", line 382, in http_error_default
return old(self, req, fp, code, msg, hdrs)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
[21/Oct/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[21/Oct/2024 09:09:09 +0100] 2414 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
[21/Oct/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon WARNING Monitor slow to respond in readiness check: 45s GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07
[21/Oct/2024 09:09:55 +0100] 2414 MonitorDaemon-Scheduler daemon INFO Monitor expired: ('GenericMonitor HDFS-DATANODE for hdfs-DATANODE-f8021b8043faaa9d9d23bf9965e6ee07',)
[21/Oct/2024 09:09:55 +0100] 2414 GM NODEMANAGER throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:61006/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[21/Oct/2024 09:09:55 +0100] 2414 GM DATANODE throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:9865/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
pecified GSS failure File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
[21/Oct/2024 09:09:55 +0100] 2414 GM REGIONSERVER throttling_logger ERROR Error fetching metrics at 'https://host.domain.com:61005/jmx'
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 157, in retry_http_kerberos_auth
neg_hdr = self.generate_request_header(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 111, in generate_request_header
result = k.authGSSClientStep(self.context, neg_value)
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Cryptosystem internal error', -1765328206))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 224, in _collect_and_parse_and_return
opened_url = urlopen_with_retry_on_authentication_errors(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 339, in urlopen_with_retry_on_authentication_errors
return function()
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/monitor/generic/metric_collectors.py", line 244, in _open_url
return self._urlopen_callout(
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/cmf/util/url.py", line 129, in urlopen_with_timeout
return opener.open(url, data, timeout)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/data/anaconda/miniconda_3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 228, in http_error_401
retry = self.http_error_auth_reqed(host, req, headers)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 149, in http_error_auth_reqed
return self.retry_http_kerberos_auth(req, headers, neg_value)
File "/data/cloudera/cm-agent/lib/python3.8/site-packages/urllib_kerberos/__init__.py", line 174, in retry_http_kerberos_auth
log.critical("GSSAPI Error: %s/%s" % (e[0][0], e[1][0]))
TypeError: 'GSSError' object is not subscriptable
Please help me out here.
... View more
Labels:
10-21-2024
01:45 AM
1 Kudo
It seems like this issue is a special case due to how we have set up our environment. I have received a script from Cloudera support team. Our hue service is currently running fine. However, the script is only temporary. Once we upgraded, we have to use the original script. Below is the major change they've made in the hue.sh script. export PYTHONPATH=/opt/cloudera/parcels/CDH/lib/hue/build/env/lib64/python2.7/site-packages
echo "print env 3st"
env;
... View more
10-19-2024
02:13 AM
1 Kudo
I have been trying to upload the full log but it keeps getting removed.
... View more