Support Questions

Find answers, ask questions, and share your expertise

Datanode and Impala Daemon Instances Show Unknown Status on Cloudera Console


Hi Community,

I am unable to stop or start some Cloudera services.
Cloudera version: 7.11.3
CDP Version: 7.1.7 SP3
Below is the type of error I get while trying to stop (e.g) impala daemon from the Cloudera Manager console.

On the UI, it shows the status of the service is unknown as shown below. As you can see, the host 207 has a question mark in its status column, which signifies unknown status.


Also, for other host (206, 208), the hdfs datanode has the same issue as that of the impala daemon.

Apart from the impala daemon and hdfs datanode instances, I get the same issue on hue load balancer instance.

Everything was fine a few days ago until yesterday. I have tried restarting the cloudera-scm-supervisord and the cloudera-scm-agent but no luck.

Below is the cloudera-scm-agent.log error I get for all the hosts on which those services are running. It's like nothing else in the cloudera-scm-agent.log apart from the following error.


[09/Nov/2024 15:18:43 +0000] 14061 __run_queue process      ERROR    Failed to update {'id': 1546497355, 'name': 'impala-IMPALAD', 'program': 'impala/', 'arguments': ['impalad', 'impalad_flags', 'false'], 'status_links': {'status': ''}, 'running': True, 'run_generation': 15, 'one_off': False, 'auto_restart': True, 'user': 'impala', 'group': 'impala', 'extra_groups': [], 'environment': {'GLOG_log_dir': '/data/log/impalad', 'HADOOP_CREDSTORE_PASSWORD': 'somePassword', 'JAVA_TOOL_OPTIONS': '-Xms8589934592 -Xmx8589934592 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/impala_impala-IMPALAD-c4fcff50b410d1eeac2d6da18c375a7d_pid{{PID}}.hprof -XX:OnOutOfMemoryError={{AGENT_COMMON_DIR}}/', 'GLOG_logbuflevel': '0', 'JAVA_HOME': '/usr/java/default', 'GLOG_v': '1', 'GLOG_minloglevel': '0', 'CDH_VERSION': '7', 'USER': 'impala', 'GLOG_max_log_size': '20'}, 'resources': [{'dynamic': True, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': {'shares': 200}, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': True, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': {'weight': 100}, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': {'soft_limit': -1, 'hard_limit': -1}, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': {'limit_fds': None, 'limit_memlock': None}, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad/audit', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/data/impala/impalad', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '', 'port': 25000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '', 'port': 22000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/data/log/impalad', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/solr/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '', 'port': 21000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '', 'port': 21050}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '', 'port': 28000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '', 'port': 27000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad/lineage', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '', 'port': 23000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/ranger/impala/policy-cache', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/hdfs/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala-minidumps', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '', 'port': 0}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/atlas-spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/impala/udfs', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': True, 'directory': {'path': '/data/log/impalad/jstacks', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/ranger/impala/policy-cache', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/hdfs/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/solr/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}], 'refresh_files': ['', '', '', '', '', 'impala-conf/fair-scheduler.xml', 'impala-conf/llama-site.xml', ''], 'config_generation': 0, 'special_file_info': [], 'parcels': {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125', 'SPARK3': ''}, 'required_tags': ['cdh', 'impala'], 'optional_tags': ['hdfs-client-plugin', 'impala-plugin'], 'start_timeout_seconds': 20, 'expected_exitcodes': [], 'start_retries': 3}
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/", line 449, in handle_heartbeat
    process = cls(agent.cfg, agent, raw)
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/", line 187, in __init__
    self.process_info = json.load(f)
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/", line 467, in load
    return loads(,
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/", line 525, in loads
    return _default_decoder.decode(s)
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/", line 370, in decode
    obj, end = self.raw_decode(s)
 File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)


 I'm not sure where else to look at this point.


Master Collaborator

Hi @sayebogbon ,

Could you please try to remove the config files from "/var/run/cloudera-scm-agent/supervisor/include".

1. Rename process dir from "/var/run/cloudera-scm-agent/process"

2. Delete orphan process dir soft link from "/var/run/cloudera-scm-agent/supervisor/include"

3. Kill the running services process
kill -9 pid

4. Restart CM agent and Stop the services from CM server.

5. Start the services from CM again. A new process dir and pid shall be created by agent.

View solution in original post


Master Collaborator

Hi @sayebogbon ,

Could you please try to remove the config files from "/var/run/cloudera-scm-agent/supervisor/include".

1. Rename process dir from "/var/run/cloudera-scm-agent/process"

2. Delete orphan process dir soft link from "/var/run/cloudera-scm-agent/supervisor/include"

3. Kill the running services process
kill -9 pid

4. Restart CM agent and Stop the services from CM server.

5. Start the services from CM again. A new process dir and pid shall be created by agent.





The issue was sorted after I reboot the host. I believe the reboot did the same things you mentioned above.

I can start, datanode, impala daemon, and yarn. However, I am still unable to start hbase regionserver. I'm getting the following error. I believe it's something related to znode file not existing in the process directory.


+ echo 'Adding HBoss JARs to HBase service classpath'
+ znode_cleanup regionserver
+ export 'HBASE_CLASSPATH=/opt/cloudera/cm/lib/plugins/event-publish-7.11.3-shaded.jar:/opt/cloudera/cm/lib/plugins/tt-instrumentation-7.11.3.jar:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase_filesystem/lib/*'
+ HBASE_CLASSPATH='/opt/cloudera/cm/lib/plugins/event-publish-7.11.3-shaded.jar:/opt/cloudera/cm/lib/plugins/tt-instrumentation-7.11.3.jar:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase_filesystem/lib/*'
+ exec /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase/../../bin/hbase --config /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER regionserver start
++ date
+ echo 'Tue 12 Nov 06:03:50 GMT 2024 Starting znode cleanup thread with HBASE_ZNODE_FILE=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/znode14618 for regionserver'
++ replace_pid
++ echo
++ sed 's#{{PID}}#14618#g'
+ '[' jaas.conf '!=' '' ']'
+ export ''
+ LOG_FILE=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs/znode_cleanup.log
+ set +x
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
/opt/cloudera/cm-agent/service/hbase/ line 234: kill: (14618) - No such process
+ RET=0
+ '[' -f /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/znode14618 ']'
++ date
+ echo 'Tue 12 Nov 06:03:56 GMT 2024 Znode file does not exist. No cleanup required.'
+ exit 0



Below is the agent log.


[12/Nov/2024 05:53:11 +0000] 1559 MainThread heartbeat_tracker INFO     HB stats (seconds): num:43 LIFE_MIN:0.08 min:0.04 mean:0.06 max:0.11 LIFE_MAX:0.20
[12/Nov/2024 06:03:12 +0000] 1559 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.04 min:0.04 mean:0.07 max:0.11 LIFE_MAX:0.20
[12/Nov/2024 06:03:16 +0000] 1559 CP Server WorkerThread _cplogging   INFO - - [12/Nov/2024:06:03:16] "GET /heartbeat HTTP/1.1" 200 2 "" "python-requests/2.26.0"
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503323-hbase-REGIONSERVER] Updating process (remove).
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503323-hbase-REGIONSERVER] Deactivating process (skipped)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503323-hbase-REGIONSERVER] stopping monitors
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503323-hbase-REGIONSERVER] Orphaning process
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      ERROR    Error creating marker /var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/", line 1302, in mark_orphan
    f = open(marker, 'w')
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp'
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Using specific audit plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Creating metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Using specific metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Using generic metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Creating profile plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Using generic profile plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Instantiating process
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Updating process: True {}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     First time to activate the process [1546503485-hbase-REGIONSERVER].
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Creating cgroup /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Creating cgroup /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Creating cgroup /sys/fs/cgroup/devices/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Created /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Chowning /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER to hbase (39993) hbase (39993)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Chmod'ing /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER to 0751
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Created /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Chowning /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs to hbase (39993) hbase (39993)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Chmod'ing /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs to 0751
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Refreshing process files: None
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     /opt/cloudera/cmlib/postgresql-connector.jar doesn't exists! Trying to find /usr/share/java/postgresql-connector-java.jar
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     /usr/share/java/postgresql-connector-java.jar doesn't exists! Trying to find a postgres jar of the pattern /opt/cloudera/cmlib/postgres*.jar
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     prepare_environment begin: {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125', 'SPARK3': ''}, ['cdh'], ['hdfs-client-plugin', 'cdh-plugin', 'hbase-plugin']
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     The following requested parcels are not available: {}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     Obtained tags ['cdh', 'impala', 'sentry', 'solr', 'spark', 'kafka', 'kudu'] for parcel CDH
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     Obtained tags ['spark3'] for parcel SPARK3
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel_patch INFO     Patched parcel in /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125 for python3 compatibility.
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     prepare_environment end: {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125'}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue __init__     INFO     Extracted 19 files and 0 dirs to /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER.
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue throttling_logger INFO     Added principal HTTP/ with keytab /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/hbase.keytab as a candidate to kinit
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: cpu
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Reconfiguring cgroup pseudofile /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER/cpu.shares with value 400
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Reconfiguring cgroup pseudofile /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER/cpu.rt_runtime_us with value 1000
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: io
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Reconfiguring cgroup pseudofile /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER/blkio.weight with value 200
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: memory
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: directory
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: tcp_listen
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     reading limits: {'limit_fds': 32768, 'limit_memlock': None}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Launching process. one-off False, command hbase/, args ['regionserver', 'start']
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue supervisor   WARNING  Failed while getting process info. Retrying. (<Fault 10: 'BAD_NAME: 1546503485-hbase-REGIONSERVER'>)
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue supervisor   INFO     Triggering supervisord update.
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Begin audit plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Begin metadata plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Begin profile plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue daemon       INFO     Instantiating generic monitor for service HBASE and role REGIONSERVER
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Begin monitor refresh.
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue abstract_monitor INFO     Refreshing GenericMonitor HBASE-REGIONSERVER for None
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue daemon       INFO     New monitor: (<cmf.monitor.generic.GenericMonitor object at 0x7f727379a2b0>,)
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Daemon refresh complete for process 1546503485-hbase-REGIONSERVER.
[12/Nov/2024 06:03:20 +0000] 1559 Profile-Plugin navigator_plugin INFO     Pipelines updated for Profile Plugin: set()
[12/Nov/2024 06:03:20 +0000] 1559 Audit-Plugin navigator_plugin INFO     Pipelines updated for Audit Plugin: []
[12/Nov/2024 06:03:20 +0000] 1559 Metadata-Plugin navigator_plugin INFO     Pipelines updated for Metadata Plugin: []
[12/Nov/2024 06:03:57 +0000] 1559 MainThread process      INFO     [1546503485-hbase-REGIONSERVER] Unregistered supervisor process FATAL
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups      INFO     Destroying cgroup /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups      INFO     Destroying cgroup /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups      INFO     Destroying cgroup /sys/fs/cgroup/devices/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:59 +0000] 1559 MainThread supervisor   INFO     Triggering supervisord update.
[12/Nov/2024 06:03:59 +0000] 1559 MainThread throttling_logger INFO     Removed keytab /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/hbase.keytab as a candidate to kinit from
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Updating process: False {'run_generation': (1, 2), 'running': (True, False)}
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Deactivating process (skipped)
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] stopping monitors
[12/Nov/2024 06:04:15 +0000] 1559 Profile-Plugin navigator_plugin INFO     stopping Profile Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:15 +0000] 1559 Audit-Plugin navigator_plugin INFO     stopping Audit Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:15 +0000] 1559 Metadata-Plugin navigator_plugin INFO     stopping Metadata Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:18 +0000] 1559 MonitorDaemon-Scheduler daemon       INFO     Monitor expired: ('GenericMonitor HBASE-REGIONSERVER for hbase-REGIONSERVER-78fd4f39bfc69a473cc5abed13e41dac',)


Master Collaborator

Does the below process folder and the file inside of it exist? The ERROR is file not found.

[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      ERROR    Error creating marker /var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/", line 1302, in mark_orphan
    f = open(marker, 'w')
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp'

Try to restart cloudera-scm-agent service and then restart RegionServer from CM. If it still doesn't work could you please try the workarounds again? 


Thanks for getting back.

The process_timestamp isn't there. It's not available on other running processes too.
I had tried the work around, it didn't work, but I will give it another go.
Another thing is the soft link for RegionServer process does not exist in /var/run/cloudera-scm-agent/supervisor/include directory.