Created 11-09-2024 07:58 AM
Hi Community,
I am unable to stop or start some Cloudera services.
Cloudera version: 7.11.3
CDP Version: 7.1.7 SP3
Below is the type of error I get while trying to stop (e.g) impala daemon from the Cloudera Manager console.
On the UI, it shows the status of the service is unknown as shown below. As you can see, the host 207 has a question mark in its status column, which signifies unknown status.
Also, for other host (206, 208), the hdfs datanode has the same issue as that of the impala daemon.
Apart from the impala daemon and hdfs datanode instances, I get the same issue on hue load balancer instance.
Everything was fine a few days ago until yesterday. I have tried restarting the cloudera-scm-supervisord and the cloudera-scm-agent but no luck.
Below is the cloudera-scm-agent.log error I get for all the hosts on which those services are running. It's like nothing else in the cloudera-scm-agent.log apart from the following error.
[09/Nov/2024 15:18:43 +0000] 14061 __run_queue process ERROR Failed to update {'id': 1546497355, 'name': 'impala-IMPALAD', 'program': 'impala/impala.sh', 'arguments': ['impalad', 'impalad_flags', 'false'], 'status_links': {'status': 'https://host-207.com:25000/'}, 'running': True, 'run_generation': 15, 'one_off': False, 'auto_restart': True, 'user': 'impala', 'group': 'impala', 'extra_groups': [], 'environment': {'GLOG_log_dir': '/data/log/impalad', 'HADOOP_CREDSTORE_PASSWORD': 'somePassword', 'JAVA_TOOL_OPTIONS': '-Xms8589934592 -Xmx8589934592 -Djavax.net.ssl.trustStore=/etc/pki/tls/private/trust.jks -Djavax.net.ssl.trustStorePassword=somePassword -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/impala_impala-IMPALAD-c4fcff50b410d1eeac2d6da18c375a7d_pid{{PID}}.hprof -XX:OnOutOfMemoryError={{AGENT_COMMON_DIR}}/killparent.sh', 'GLOG_logbuflevel': '0', 'JAVA_HOME': '/usr/java/default', 'GLOG_v': '1', 'GLOG_minloglevel': '0', 'CDH_VERSION': '7', 'USER': 'impala', 'GLOG_max_log_size': '20'}, 'resources': [{'dynamic': True, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': {'shares': 200}, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': True, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': {'weight': 100}, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': {'soft_limit': -1, 'hard_limit': -1}, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': {'limit_fds': None, 'limit_memlock': None}, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad/audit', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/data/impala/impalad', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 25000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 22000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/data/log/impalad', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/solr/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 21000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 21050}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 28000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 27000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad/lineage', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 23000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/ranger/impala/policy-cache', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/hdfs/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala-minidumps', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 0}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/atlas-spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/impala/udfs', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': True, 'directory': {'path': '/data/log/impalad/jstacks', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/ranger/impala/policy-cache', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/hdfs/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/solr/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}], 'refresh_files': ['cloudera-stack-monitor.properties', 'cloudera-monitor.properties', 'cloudera-monitor.properties', 'navigator.client.properties', 'navigator.lineage.client.properties', 'impala-conf/fair-scheduler.xml', 'impala-conf/llama-site.xml', 'telepub.client.properties'], 'config_generation': 0, 'special_file_info': [], 'parcels': {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125', 'SPARK3': '3.2.3.3.2.7172000.0-334-1.p0.37609510'}, 'required_tags': ['cdh', 'impala'], 'optional_tags': ['hdfs-client-plugin', 'impala-plugin'], 'start_timeout_seconds': 20, 'expected_exitcodes': [], 'start_retries': 3}
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 449, in handle_heartbeat
process = cls(agent.cfg, agent, raw)
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 187, in __init__
self.process_info = json.load(f)
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/__init__.py", line 467, in load
return loads(fp.read(),
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/__init__.py", line 525, in loads
return _default_decoder.decode(s)
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I'm not sure where else to look at this point.
Created 11-11-2024 09:31 AM
Hi @sayebogbon ,
Could you please try to remove the config files from "/var/run/cloudera-scm-agent/supervisor/include".
1. Rename process dir from "/var/run/cloudera-scm-agent/process"
2. Delete orphan process dir soft link from "/var/run/cloudera-scm-agent/supervisor/include"
3. Kill the running services process
kill -9 pid
4. Restart CM agent and Stop the services from CM server.
5. Start the services from CM again. A new process dir and pid shall be created by agent.
Created 11-11-2024 10:18 PM
The issue was sorted after I reboot the host. I believe the reboot did the same things you mentioned above.
I can start, datanode, impala daemon, and yarn. However, I am still unable to start hbase regionserver. I'm getting the following error. I believe it's something related to znode file not existing in the process directory.
+ echo 'Adding HBoss JARs to HBase service classpath'
+ znode_cleanup regionserver
+ export 'HBASE_CLASSPATH=/opt/cloudera/cm/lib/plugins/event-publish-7.11.3-shaded.jar:/opt/cloudera/cm/lib/plugins/tt-instrumentation-7.11.3.jar:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase_filesystem/lib/*'
+ HBASE_CLASSPATH='/opt/cloudera/cm/lib/plugins/event-publish-7.11.3-shaded.jar:/opt/cloudera/cm/lib/plugins/tt-instrumentation-7.11.3.jar:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase_filesystem/lib/*'
+ exec /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase/../../bin/hbase --config /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER regionserver start
++ date
+ echo 'Tue 12 Nov 06:03:50 GMT 2024 Starting znode cleanup thread with HBASE_ZNODE_FILE=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/znode14618 for regionserver'
++ replace_pid -Djava.net.preferIPv4Stack=true
++ echo -Djava.net.preferIPv4Stack=true
++ sed 's#{{PID}}#14618#g'
+ HBASE_OPTS=-Djava.net.preferIPv4Stack=true
+ '[' jaas.conf '!=' '' ']'
+ export 'HBASE_OPTS=-Djava.security.auth.login.config=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/jaas.conf -Djava.net.preferIPv4Stack=true'
+ HBASE_OPTS='-Djava.security.auth.login.config=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/jaas.conf -Djava.net.preferIPv4Stack=true'
+ LOG_FILE=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs/znode_cleanup.log
+ set +x
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
/opt/cloudera/cm-agent/service/hbase/hbase.sh: line 234: kill: (14618) - No such process
+ RET=0
+ '[' -f /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/znode14618 ']'
++ date
+ echo 'Tue 12 Nov 06:03:56 GMT 2024 Znode file does not exist. No cleanup required.'
+ exit 0
Below is the agent log.
[12/Nov/2024 05:53:11 +0000] 1559 MainThread heartbeat_tracker INFO HB stats (seconds): num:43 LIFE_MIN:0.08 min:0.04 mean:0.06 max:0.11 LIFE_MAX:0.20
[12/Nov/2024 06:03:12 +0000] 1559 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.04 min:0.04 mean:0.07 max:0.11 LIFE_MAX:0.20
[12/Nov/2024 06:03:16 +0000] 1559 CP Server WorkerThread _cplogging INFO 127.0.0.1 - - [12/Nov/2024:06:03:16] "GET /heartbeat HTTP/1.1" 200 2 "" "python-requests/2.26.0"
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503323-hbase-REGIONSERVER] Updating process (remove).
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503323-hbase-REGIONSERVER] Deactivating process (skipped)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503323-hbase-REGIONSERVER] stopping monitors
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503323-hbase-REGIONSERVER] Orphaning process
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process ERROR Error creating marker /var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 1302, in mark_orphan
f = open(marker, 'w')
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp'
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Using specific audit plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Creating metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Using specific metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Using generic metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Creating profile plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util INFO Using generic profile plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Instantiating process
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Updating process: True {}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO First time to activate the process [1546503485-hbase-REGIONSERVER].
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Creating cgroup /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Creating cgroup /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Creating cgroup /sys/fs/cgroup/devices/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Created /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Chowning /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER to hbase (39993) hbase (39993)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER to 0751
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Created /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Chowning /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs to hbase (39993) hbase (39993)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs to 0751
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Refreshing process files: None
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO /opt/cloudera/cmlib/postgresql-connector.jar doesn't exists! Trying to find /usr/share/java/postgresql-connector-java.jar
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO /usr/share/java/postgresql-connector-java.jar doesn't exists! Trying to find a postgres jar of the pattern /opt/cloudera/cmlib/postgres*.jar
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO prepare_environment begin: {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125', 'SPARK3': '3.2.3.3.2.7172000.0-334-1.p0.37609510'}, ['cdh'], ['hdfs-client-plugin', 'cdh-plugin', 'hbase-plugin']
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO The following requested parcels are not available: {}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO Obtained tags ['cdh', 'impala', 'sentry', 'solr', 'spark', 'kafka', 'kudu'] for parcel CDH
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO Obtained tags ['spark3'] for parcel SPARK3
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel_patch INFO Patched parcel in /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125 for python3 compatibility.
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel INFO prepare_environment end: {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125'}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue __init__ INFO Extracted 19 files and 0 dirs to /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER.
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue throttling_logger INFO Added principal HTTP/host.com with keytab /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/hbase.keytab as a candidate to kinit
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: cpu
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Reconfiguring cgroup pseudofile /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER/cpu.shares with value 400
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Reconfiguring cgroup pseudofile /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER/cpu.rt_runtime_us with value 1000
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: io
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups INFO Reconfiguring cgroup pseudofile /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER/blkio.weight with value 200
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: memory
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: directory
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Evaluating resource: tcp_listen
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO reading limits: {'limit_fds': 32768, 'limit_memlock': None}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Launching process. one-off False, command hbase/hbase.sh, args ['regionserver', 'start']
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue supervisor WARNING Failed while getting process info. Retrying. (<Fault 10: 'BAD_NAME: 1546503485-hbase-REGIONSERVER'>)
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue supervisor INFO Triggering supervisord update.
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Begin audit plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Begin metadata plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Begin profile plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue daemon INFO Instantiating generic monitor for service HBASE and role REGIONSERVER
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Begin monitor refresh.
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue abstract_monitor INFO Refreshing GenericMonitor HBASE-REGIONSERVER for None
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue daemon INFO New monitor: (<cmf.monitor.generic.GenericMonitor object at 0x7f727379a2b0>,)
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process INFO Daemon refresh complete for process 1546503485-hbase-REGIONSERVER.
[12/Nov/2024 06:03:20 +0000] 1559 Profile-Plugin navigator_plugin INFO Pipelines updated for Profile Plugin: set()
[12/Nov/2024 06:03:20 +0000] 1559 Audit-Plugin navigator_plugin INFO Pipelines updated for Audit Plugin: []
[12/Nov/2024 06:03:20 +0000] 1559 Metadata-Plugin navigator_plugin INFO Pipelines updated for Metadata Plugin: []
[12/Nov/2024 06:03:57 +0000] 1559 MainThread process INFO [1546503485-hbase-REGIONSERVER] Unregistered supervisor process FATAL
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups INFO Destroying cgroup /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups INFO Destroying cgroup /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups INFO Destroying cgroup /sys/fs/cgroup/devices/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:59 +0000] 1559 MainThread supervisor INFO Triggering supervisord update.
[12/Nov/2024 06:03:59 +0000] 1559 MainThread throttling_logger INFO Removed keytab /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/hbase.keytab as a candidate to kinit from
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Updating process: False {'run_generation': (1, 2), 'running': (True, False)}
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] Deactivating process (skipped)
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process INFO [1546503485-hbase-REGIONSERVER] stopping monitors
[12/Nov/2024 06:04:15 +0000] 1559 Profile-Plugin navigator_plugin INFO stopping Profile Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:15 +0000] 1559 Audit-Plugin navigator_plugin INFO stopping Audit Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:15 +0000] 1559 Metadata-Plugin navigator_plugin INFO stopping Metadata Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:18 +0000] 1559 MonitorDaemon-Scheduler daemon INFO Monitor expired: ('GenericMonitor HBASE-REGIONSERVER for hbase-REGIONSERVER-78fd4f39bfc69a473cc5abed13e41dac',)
Created on 11-12-2024 08:46 PM - edited 11-12-2024 08:46 PM
Does the below process folder and the file inside of it exist? The ERROR is file not found.
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process ERROR Error creating marker /var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 1302, in mark_orphan f = open(marker, 'w') FileNotFoundError: [Errno 2] No such file or directory: '/var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp'
Try to restart cloudera-scm-agent service and then restart RegionServer from CM. If it still doesn't work could you please try the workarounds again?
Created 11-14-2024 10:43 AM
Thanks for getting back.
The process_timestamp isn't there. It's not available on other running processes too.
I had tried the work around, it didn't work, but I will give it another go.
Another thing is the soft link for RegionServer process does not exist in /var/run/cloudera-scm-agent/supervisor/include directory.