Support Questions

Find answers, ask questions, and share your expertise

ERROR Installing Agent

avatar
Explorer

Hi everyone

When installing cloudera-scm-agent, the installation failed due to heartbeat check failure. However, the backend logs show no errors and even output INFO-level logs indicating successful heartbeat detection. What could be the root cause?

agent.logagent.loginstalling loginstalling logserver.logserver.log企业微信截图_17411675661642.png

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hello @xiaocao 

Thank you for your update

I was actually concerned with the below settings

Use TLS Encryption for Agents

Use TLS Authentication of Agents to Server

 
Anyways if you are observing heartbeat success in the agent and server_host is correct set to CM server in /etc/cloudera-scm-agent/config.ini then two things we will need to check
 
a.) CM Server logs (/var/log/cloudera-scm-server/cloudera-scm-server.log) on CM server host
 
b.) capture network packet dump on CM side on port 7182 and analyze the packets on port 7182 whether client request is reaching or not

View solution in original post

17 REPLIES 17

avatar
Master Collaborator

Hello @xiaocao 

Thank you for reaching out to Cloudera Support

Can you please check the /etc/cloudera-scm-agent/config.ini once on the affected node. Is server_host pointing to the CM server

What is the CM and CDP version here?

Check /var/log/messages from the affected node as well and see if find any errors regarding CM

 

avatar
Explorer

Hi,

the version is 6.3.1

config.ini:

server_host=192.168.201.157

server_port=7182

max_collection_wait_seconds=10.0

metrics_url_timeout_seconds=30.0

task_metrics_timeout_seconds=5.0

monitored_nodev_filesystem_types=nfs,nfs4,tmpfs

local_filesystem_whitelist=ext2,ext3,ext4,xfs

impala_profile_bundle_max_bytes=1073741824

stacks_log_bundle_max_bytes=1073741824

stacks_log_max_uncompressed_file_size_bytes=5242880

orphan_process_dir_staleness_threshold=5184000

orphan_process_dir_refresh_interval=3600

scm_debug=INFO

dns_resolution_collection_interval_seconds=60

dns_resolution_collection_timeout_seconds=30

use_tls=0

max_cert_depth=9

messages:

Mar 5 17:08:48 0860cdh03vt systemd-logind: New session 15 of user root.
Mar 5 17:08:48 0860cdh03vt systemd: Started Session 15 of user root.
Mar 5 17:10:01 0860cdh03vt systemd: Started Session 16 of user root.
Mar 5 17:10:07 0860cdh03vt systemd: Stopping Cloudera Manager Agent Service...
Mar 5 17:10:08 0860cdh03vt systemd: Stopped Cloudera Manager Agent Service.
Mar 5 17:10:08 0860cdh03vt systemd: Started Cloudera Manager Agent Service.
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread logging INFO SCM Agent (agent) Version: 6.3.1
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread main INFO Executing with arguments: /opt/cloudera/cm-agent/bin/cm agent
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread navigator_thread INFO Creating the Audit Thread
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread navigator_thread INFO Creating the Metadata Thread
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread navigator_thread INFO Creating the Profile Thread
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread main INFO Using existing process limit: (127909, 127909)
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Missing database jar: /usr/share/java/mysql-connector-java.jar (normal, if you're not using this database type)
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database type)
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Found database jar: /opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Not starting a new session.
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/supervisor
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/flood
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/supervisor/include
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/cgroups
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread agent INFO Re-using pre-existing directory: /var/run/cloudera-scm-agent/process
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread tmpfs INFO Reusing mounted tmpfs at /var/run/cloudera-scm-agent/process
Mar 5 17:10:09 0860cdh03vt cm: [05/Mar/2025 17:10:09 +0000] 15752 MainThread logging INFO Logging to /var/log/cloudera-scm-agent/cloudera-scm-agent.log
Mar 5 17:10:17 0860cdh03vt cm: /opt/cloudera/cm-agent/lib/python2.7/site-packages/psutil/_pslinux.py:477: RuntimeWarning: dirty, writeback, mapped, commit_limit memory stats couldn't be determined and were set to 0
Mar 5 17:10:17 0860cdh03vt cm: warnings.warn(msg, RuntimeWarning)
Mar 5 17:11:09 0860cdh03vt systemd-logind: Removed session 15.
Mar 5 17:20:01 0860cdh03vt systemd: Started Session 17 of user root.

avatar
Master Collaborator

Hello @xiaocao 

Can you try hard restarting the agent once

# systemctl stop cloudera-scm-agent

# systemctl stop cloudera-scm-supervisord

# systemctl start cloudera-scm-agent

Also it this a new installation or was it registered earlier with some other CM server?

 

avatar
Explorer

HI

This a new installation. It is the first time to install the cloudera-scm-agent and cloudera-scm-server

avatar
Master Collaborator

Hello,

Thank you for your update

Did hard restarting the agent helped?

 

# systemctl stop cloudera-scm-agent

# systemctl stop cloudera-scm-supervisord

# systemctl start cloudera-scm-agent

 

Also can you share the output of below command from the affected node

# rpm -qa | grep -i cloudera

# free -mh

# df -hT

# netstat -tulpn | grep -i 9000

# netstat -tulpn | grep -i 9001

avatar
Explorer

I restarting the agent but not helped

below is output of your shared command from the affected node

企业微信截图_17411735889617.png

avatar
Master Collaborator

Hello @xiaocao 

Can you check if all three PID's are cloudera process only

# ps -ef | grep -i 15833

# ps -ef | grep -i 10429

# ps -ef | grep -i 15752

# netstat <CM_SERVER_IP> 7182

# netstat <CM_SERVER_HOST> 7182

avatar
Explorer

HI,

I'm checked all three PID's are cloudera process only

企业微信截图_17411755443251.png企业微信截图_17411756108624.png

avatar
Master Collaborator

Hello @xiaocao 

Sorry, there was a typo Please run the below command

# telnet  <CM_SERVER_IP> 7182

# telnet <CM_SERVER_HOST> 7182

# systemctl status cloudera-scm-agent