Support Questions

Find answers, ask questions, and share your expertise

Agent Heartbeat problems on datanodes

avatar
Explorer

Hi,

since a few weeks, we have regular warnings on our datanodes :

"This role's host has been out of contact with Cloudera Manager for a concerning amount of time."

cm-agent is consuming alot of CPU , in particular MonitorDaemon-R (cm-agent PID is 25598):

top -H -b -n1 -p 25598
top - 14:08:21 up 22 days, 6:17, 1 user, load average: 5,10, 5,07, 4,98
Threads: 35 total, 2 running, 33 sleeping, 0 stopped, 0 zombie
%Cpu(s): 20,1 us, 21,5 sy, 0,0 ni, 56,5 id, 0,0 wa, 0,0 hi, 1,9 si, 0,0 st
KiB Mem : 26370784+total, 7788424 free, 38057140 used, 21786227+buff/cache
KiB Swap: 31457276 total, 31349500 free, 107776 used. 22445827+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25833 root 20 0 3085980 407988 10016 R 87,5 0,2 124:55.80 MonitorDaemon-R
25598 root 20 0 3085980 407988 10016 S 0,0 0,2 28:07.22 cmagent
25748 root 20 0 3085980 407988 10016 S 0,0 0,2 2:49.93 cmagent
25754 root 20 0 3085980 407988 10016 S 0,0 0,2 0:14.20 Audit-Plugin
25755 root 20 0 3085980 407988 10016 S 0,0 0,2 0:13.98 Metadata-Plugin
25756 root 20 0 3085980 407988 10016 S 0,0 0,2 0:14.35 Profile-Plugin
25800 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.15 _TimeoutMonitor
25801 root 20 0 3085980 407988 10016 S 0,0 0,2 0:10.39 HTTPServer _sta
25802 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.00 CP Server Worke
25803 root 20 0 3085980 407988 10016 S 0,0 0,2 0:03.12 CP Server Worke
25804 root 20 0 3085980 407988 10016 S 0,0 0,2 0:04.83 CP Server Worke
25805 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.03 CP Server Worke
25806 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.01 CP Server Worke
25807 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.00 CP Server Worke
25808 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.00 CP Server Worke
25809 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.00 CP Server Worke
25810 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.00 CP Server Worke
25811 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.00 CP Server Worke
25831 root 20 0 3085980 407988 10016 S 0,0 0,2 0:09.61 Monitor-HostMon
25832 root 20 0 3085980 407988 10016 S 0,0 0,2 0:27.20 DnsResolutionMo
25834 root 20 0 3085980 407988 10016 S 0,0 0,2 2:49.89 MonitorDaemon-S
26279 root 20 0 3085980 407988 10016 S 0,0 0,2 3:02.54 WorkerThread
26629 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.36 __run_queue
26630 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.39 __run_queue
26631 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.46 __run_queue
26632 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.80 __run_queue
26637 root 20 0 3085980 407988 10016 S 0,0 0,2 4:24.79 GM KUDU_TSERVER
26639 root 20 0 3085980 407988 10016 S 0,0 0,2 0:02.74 Monitor-SolrSer
26641 root 20 0 3085980 407988 10016 S 0,0 0,2 0:24.39 GM KAFKA_BROKER
26647 root 20 0 3085980 407988 10016 S 0,0 0,2 0:02.14 GM REGIONSERVER
26649 root 20 0 3085980 407988 10016 S 0,0 0,2 0:08.91 GM NODEMANAGER
26651 root 20 0 3085980 407988 10016 S 0,0 0,2 0:00.31 GM OZONE_DATANO
26653 root 20 0 3085980 407988 10016 S 0,0 0,2 0:08.82 GM DATANODE
26656 root 20 0 3085980 407988 10016 S 0,0 0,2 0:06.44 GM IMPALAD
26657 root 20 0 3085980 407988 10016 R 0,0 0,2 21:01.25 ImpalaDaemonQue

What are the next steps identifying the root cause of this issue ?

(CDP 7.1.6)

Thanks in advance for your help.

 

2 REPLIES 2

avatar
Master Collaborator

Hello @OlivierT 

Thank you for reaching out

 

Can you please share the output of the below to see what is creating this process?

# ps -ef | grep -i 25833

avatar
Explorer

Hi,

sorry for the late answer, I was off for a few days.

ps -ef | grep -i 25833 returnnothing