Support Questions

Find answers, ask questions, and share your expertise

Datanode and Impala Daemon Instances Show Unknown Status on Cloudera Console

avatar
Contributor

Hi Community,

I am unable to stop or start some Cloudera services.
Cloudera version: 7.11.3
CDP Version: 7.1.7 SP3
Below is the type of error I get while trying to stop (e.g) impala daemon from the Cloudera Manager console.
impalad-stop-for-host_207.png

On the UI, it shows the status of the service is unknown as shown below. As you can see, the host 207 has a question mark in its status column, which signifies unknown status.

impalad-daemon-instance-status.png

Also, for other host (206, 208), the hdfs datanode has the same issue as that of the impala daemon.

data-node-instance-status.png
Apart from the impala daemon and hdfs datanode instances, I get the same issue on hue load balancer instance.


Everything was fine a few days ago until yesterday. I have tried restarting the cloudera-scm-supervisord and the cloudera-scm-agent but no luck.

Below is the cloudera-scm-agent.log error I get for all the hosts on which those services are running. It's like nothing else in the cloudera-scm-agent.log apart from the following error.

 

[09/Nov/2024 15:18:43 +0000] 14061 __run_queue process      ERROR    Failed to update {'id': 1546497355, 'name': 'impala-IMPALAD', 'program': 'impala/impala.sh', 'arguments': ['impalad', 'impalad_flags', 'false'], 'status_links': {'status': 'https://host-207.com:25000/'}, 'running': True, 'run_generation': 15, 'one_off': False, 'auto_restart': True, 'user': 'impala', 'group': 'impala', 'extra_groups': [], 'environment': {'GLOG_log_dir': '/data/log/impalad', 'HADOOP_CREDSTORE_PASSWORD': 'somePassword', 'JAVA_TOOL_OPTIONS': '-Xms8589934592 -Xmx8589934592 -Djavax.net.ssl.trustStore=/etc/pki/tls/private/trust.jks -Djavax.net.ssl.trustStorePassword=somePassword -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/impala_impala-IMPALAD-c4fcff50b410d1eeac2d6da18c375a7d_pid{{PID}}.hprof -XX:OnOutOfMemoryError={{AGENT_COMMON_DIR}}/killparent.sh', 'GLOG_logbuflevel': '0', 'JAVA_HOME': '/usr/java/default', 'GLOG_v': '1', 'GLOG_minloglevel': '0', 'CDH_VERSION': '7', 'USER': 'impala', 'GLOG_max_log_size': '20'}, 'resources': [{'dynamic': True, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': {'shares': 200}, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': True, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': {'weight': 100}, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': {'soft_limit': -1, 'hard_limit': -1}, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': {'limit_fds': None, 'limit_memlock': None}, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad/audit', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/data/impala/impalad', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 25000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 22000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/data/log/impalad', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/solr/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 21000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 21050}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 28000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 27000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad/lineage', 'user': 'impala', 'group': 'impala', 'mode': 448, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 23000}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/ranger/impala/policy-cache', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impalad', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/hdfs/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala-minidumps', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': None, 'file': None, 'tcp_listen': {'bind_address': '0.0.0.0', 'port': 0}, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/atlas-spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/impala/udfs', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': True, 'directory': {'path': '/data/log/impalad/jstacks', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/lib/ranger/impala/policy-cache', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/hdfs/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}, {'dynamic': False, 'directory': {'path': '/var/log/impala/audit/solr/spool', 'user': 'impala', 'group': 'impala', 'mode': 493, 'bytes_free_warning_threshhold_bytes': 0}, 'file': None, 'tcp_listen': None, 'cpu': None, 'named_cpu': None, 'io': None, 'memory': None, 'rlimits': None, 'contents': None, 'install': None, 'named_resource': None, 'custom_resource': None}], 'refresh_files': ['cloudera-stack-monitor.properties', 'cloudera-monitor.properties', 'cloudera-monitor.properties', 'navigator.client.properties', 'navigator.lineage.client.properties', 'impala-conf/fair-scheduler.xml', 'impala-conf/llama-site.xml', 'telepub.client.properties'], 'config_generation': 0, 'special_file_info': [], 'parcels': {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125', 'SPARK3': '3.2.3.3.2.7172000.0-334-1.p0.37609510'}, 'required_tags': ['cdh', 'impala'], 'optional_tags': ['hdfs-client-plugin', 'impala-plugin'], 'start_timeout_seconds': 20, 'expected_exitcodes': [], 'start_retries': 3}
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 449, in handle_heartbeat
    process = cls(agent.cfg, agent, raw)
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 187, in __init__
    self.process_info = json.load(f)
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/__init__.py", line 467, in load
    return loads(fp.read(),
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/__init__.py", line 525, in loads
    return _default_decoder.decode(s)
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
 File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

 

 I'm not sure where else to look at this point.

4 REPLIES 4

avatar
Master Collaborator

Hi @sayebogbon ,

Could you please try to remove the config files from "/var/run/cloudera-scm-agent/supervisor/include".

1. Rename process dir from "/var/run/cloudera-scm-agent/process"

2. Delete orphan process dir soft link from "/var/run/cloudera-scm-agent/supervisor/include"

3. Kill the running services process
kill -9 pid

4. Restart CM agent and Stop the services from CM server.

5. Start the services from CM again. A new process dir and pid shall be created by agent.

avatar
Contributor

 

 

 

The issue was sorted after I reboot the host. I believe the reboot did the same things you mentioned above.

I can start, datanode, impala daemon, and yarn. However, I am still unable to start hbase regionserver. I'm getting the following error. I believe it's something related to znode file not existing in the process directory.


 

+ echo 'Adding HBoss JARs to HBase service classpath'
+ znode_cleanup regionserver
+ export 'HBASE_CLASSPATH=/opt/cloudera/cm/lib/plugins/event-publish-7.11.3-shaded.jar:/opt/cloudera/cm/lib/plugins/tt-instrumentation-7.11.3.jar:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase_filesystem/lib/*'
+ HBASE_CLASSPATH='/opt/cloudera/cm/lib/plugins/event-publish-7.11.3-shaded.jar:/opt/cloudera/cm/lib/plugins/tt-instrumentation-7.11.3.jar:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase_filesystem/lib/*'
+ exec /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125/lib/hbase/../../bin/hbase --config /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER regionserver start
++ date
+ echo 'Tue 12 Nov 06:03:50 GMT 2024 Starting znode cleanup thread with HBASE_ZNODE_FILE=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/znode14618 for regionserver'
++ replace_pid -Djava.net.preferIPv4Stack=true
++ echo -Djava.net.preferIPv4Stack=true
++ sed 's#{{PID}}#14618#g'
+ HBASE_OPTS=-Djava.net.preferIPv4Stack=true
+ '[' jaas.conf '!=' '' ']'
+ export 'HBASE_OPTS=-Djava.security.auth.login.config=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/jaas.conf -Djava.net.preferIPv4Stack=true'
+ HBASE_OPTS='-Djava.security.auth.login.config=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/jaas.conf -Djava.net.preferIPv4Stack=true'
+ LOG_FILE=/var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs/znode_cleanup.log
+ set +x
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
/opt/cloudera/cm-agent/service/hbase/hbase.sh: line 234: kill: (14618) - No such process
+ RET=0
+ '[' -f /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/znode14618 ']'
++ date
+ echo 'Tue 12 Nov 06:03:56 GMT 2024 Znode file does not exist. No cleanup required.'
+ exit 0

 

 

Below is the agent log.

 

[12/Nov/2024 05:53:11 +0000] 1559 MainThread heartbeat_tracker INFO     HB stats (seconds): num:43 LIFE_MIN:0.08 min:0.04 mean:0.06 max:0.11 LIFE_MAX:0.20
[12/Nov/2024 06:03:12 +0000] 1559 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.04 min:0.04 mean:0.07 max:0.11 LIFE_MAX:0.20
[12/Nov/2024 06:03:16 +0000] 1559 CP Server WorkerThread _cplogging   INFO     127.0.0.1 - - [12/Nov/2024:06:03:16] "GET /heartbeat HTTP/1.1" 200 2 "" "python-requests/2.26.0"
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503323-hbase-REGIONSERVER] Updating process (remove).
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503323-hbase-REGIONSERVER] Deactivating process (skipped)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503323-hbase-REGIONSERVER] stopping monitors
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503323-hbase-REGIONSERVER] Orphaning process
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      ERROR    Error creating marker /var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 1302, in mark_orphan
    f = open(marker, 'w')
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp'
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Using specific audit plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Creating metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Using specific metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Using generic metadata plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Creating profile plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue util         INFO     Using generic profile plugin for process hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Instantiating process
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Updating process: True {}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     First time to activate the process [1546503485-hbase-REGIONSERVER].
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Creating cgroup /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Creating cgroup /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Creating cgroup /sys/fs/cgroup/devices/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Created /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Chowning /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER to hbase (39993) hbase (39993)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Chmod'ing /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER to 0751
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Created /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Chowning /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs to hbase (39993) hbase (39993)
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue agent        INFO     Chmod'ing /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/logs to 0751
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Refreshing process files: None
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     /opt/cloudera/cmlib/postgresql-connector.jar doesn't exists! Trying to find /usr/share/java/postgresql-connector-java.jar
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     /usr/share/java/postgresql-connector-java.jar doesn't exists! Trying to find a postgres jar of the pattern /opt/cloudera/cmlib/postgres*.jar
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     prepare_environment begin: {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125', 'SPARK3': '3.2.3.3.2.7172000.0-334-1.p0.37609510'}, ['cdh'], ['hdfs-client-plugin', 'cdh-plugin', 'hbase-plugin']
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     The following requested parcels are not available: {}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     Obtained tags ['cdh', 'impala', 'sentry', 'solr', 'spark', 'kafka', 'kudu'] for parcel CDH
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     Obtained tags ['spark3'] for parcel SPARK3
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel_patch INFO     Patched parcel in /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p3013.57035125 for python3 compatibility.
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue parcel       INFO     prepare_environment end: {'CDH': '7.1.7-1.cdh7.1.7.p3013.57035125'}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue __init__     INFO     Extracted 19 files and 0 dirs to /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER.
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue throttling_logger INFO     Added principal HTTP/host.com with keytab /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/hbase.keytab as a candidate to kinit
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: cpu
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Reconfiguring cgroup pseudofile /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER/cpu.shares with value 400
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Reconfiguring cgroup pseudofile /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER/cpu.rt_runtime_us with value 1000
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: io
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue cgroups      INFO     Reconfiguring cgroup pseudofile /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER/blkio.weight with value 200
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: memory
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: directory
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Evaluating resource: tcp_listen
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     reading limits: {'limit_fds': 32768, 'limit_memlock': None}
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Launching process. one-off False, command hbase/hbase.sh, args ['regionserver', 'start']
[12/Nov/2024 06:03:16 +0000] 1559 __run_queue supervisor   WARNING  Failed while getting process info. Retrying. (<Fault 10: 'BAD_NAME: 1546503485-hbase-REGIONSERVER'>)
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue supervisor   INFO     Triggering supervisord update.
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Begin audit plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Begin metadata plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Begin profile plugin refresh
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue daemon       INFO     Instantiating generic monitor for service HBASE and role REGIONSERVER
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Begin monitor refresh.
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue abstract_monitor INFO     Refreshing GenericMonitor HBASE-REGIONSERVER for None
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue daemon       INFO     New monitor: (<cmf.monitor.generic.GenericMonitor object at 0x7f727379a2b0>,)
[12/Nov/2024 06:03:18 +0000] 1559 __run_queue process      INFO     Daemon refresh complete for process 1546503485-hbase-REGIONSERVER.
[12/Nov/2024 06:03:20 +0000] 1559 Profile-Plugin navigator_plugin INFO     Pipelines updated for Profile Plugin: set()
[12/Nov/2024 06:03:20 +0000] 1559 Audit-Plugin navigator_plugin INFO     Pipelines updated for Audit Plugin: []
[12/Nov/2024 06:03:20 +0000] 1559 Metadata-Plugin navigator_plugin INFO     Pipelines updated for Metadata Plugin: []
[12/Nov/2024 06:03:57 +0000] 1559 MainThread process      INFO     [1546503485-hbase-REGIONSERVER] Unregistered supervisor process FATAL
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups      INFO     Destroying cgroup /sys/fs/cgroup/blkio/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups      INFO     Destroying cgroup /sys/fs/cgroup/cpu,cpuacct/system.slice/cloudera-scm-agent.service/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:57 +0000] 1559 MainThread cgroups      INFO     Destroying cgroup /sys/fs/cgroup/devices/1546503485-hbase-REGIONSERVER
[12/Nov/2024 06:03:59 +0000] 1559 MainThread supervisor   INFO     Triggering supervisord update.
[12/Nov/2024 06:03:59 +0000] 1559 MainThread throttling_logger INFO     Removed keytab /var/run/cloudera-scm-agent/process/1546503485-hbase-REGIONSERVER/hbase.keytab as a candidate to kinit from
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Updating process: False {'run_generation': (1, 2), 'running': (True, False)}
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] Deactivating process (skipped)
[12/Nov/2024 06:04:12 +0000] 1559 __run_queue process      INFO     [1546503485-hbase-REGIONSERVER] stopping monitors
[12/Nov/2024 06:04:15 +0000] 1559 Profile-Plugin navigator_plugin INFO     stopping Profile Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:15 +0000] 1559 Audit-Plugin navigator_plugin INFO     stopping Audit Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:15 +0000] 1559 Metadata-Plugin navigator_plugin INFO     stopping Metadata Plugin for hbase-REGIONSERVER with count 0 pipelines names [].
[12/Nov/2024 06:04:18 +0000] 1559 MonitorDaemon-Scheduler daemon       INFO     Monitor expired: ('GenericMonitor HBASE-REGIONSERVER for hbase-REGIONSERVER-78fd4f39bfc69a473cc5abed13e41dac',)

 

avatar
Master Collaborator

Does the below process folder and the file inside of it exist? The ERROR is file not found.

[12/Nov/2024 06:03:16 +0000] 1559 __run_queue process      ERROR    Error creating marker /var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python3.8/site-packages/cmf/process.py", line 1302, in mark_orphan
    f = open(marker, 'w')
FileNotFoundError: [Errno 2] No such file or directory: '/var/run/cloudera-scm-agent/process/1546503323-hbase-REGIONSERVER/process_timestamp'

Try to restart cloudera-scm-agent service and then restart RegionServer from CM. If it still doesn't work could you please try the workarounds again? 

avatar
Contributor

Thanks for getting back.

The process_timestamp isn't there. It's not available on other running processes too.
I had tried the work around, it didn't work, but I will give it another go.
Another thing is the soft link for RegionServer process does not exist in /var/run/cloudera-scm-agent/supervisor/include directory.