Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Cloudera Manager agent bad healthy

avatar
Expert Contributor

my  CDH Cluster has been online more than half year,  all the machines and service is normal.

 

but recently , there is one host agent always under bad healthy, and Cloudera Manager will show the red color on this host.

 

then I check the agent log as below:

[16/Nov/2019 02:20:11 +0000] 1142 MainThread agent WARNING Long HB processing time: 9.52771306038
[16/Nov/2019 02:20:52 +0000] 1142 MainThread agent WARNING Long HB processing time: 34.8044610023
[16/Nov/2019 02:20:52 +0000] 1142 MainThread agent WARNING Delayed HB: 19s since last
[16/Nov/2019 02:21:14 +0000] 1142 MainThread agent WARNING Long HB processing time: 22.3401939869
[16/Nov/2019 02:21:14 +0000] 1142 MainThread agent WARNING Delayed HB: 7s since last
[16/Nov/2019 02:21:51 +0000] 1142 MainThread agent WARNING Long HB processing time: 36.7459609509
[16/Nov/2019 02:21:51 +0000] 1142 MainThread agent WARNING Delayed HB: 21s since last
[16/Nov/2019 02:22:24 +0000] 1142 MainThread agent WARNING Long HB processing time: 33.0430369377
[16/Nov/2019 02:22:24 +0000] 1142 MainThread agent WARNING Delayed HB: 18s since last
[16/Nov/2019 02:23:04 +0000] 1142 MainThread agent WARNING Long HB processing time: 40.0112349987
[16/Nov/2019 02:23:04 +0000] 1142 MainThread agent WARNING Delayed HB: 25s since last
[16/Nov/2019 02:23:35 +0000] 1142 MainThread agent WARNING Long HB processing time: 30.6218628883
[16/Nov/2019 02:23:35 +0000] 1142 MainThread agent WARNING Delayed HB: 15s since last
[16/Nov/2019 02:24:17 +0000] 1142 MainThread agent WARNING Long HB processing time: 42.0491158962
[16/Nov/2019 02:24:17 +0000] 1142 MainThread agent WARNING Delayed HB: 27s since last
[16/Nov/2019 02:24:55 +0000] 1142 MainThread heartbeat_tracker INFO HB stats (seconds): num:17 LIFE_MIN:0.01 min:0.02 mean:0.18 max:0.53 LIFE_MAX:1.67
[16/Nov/2019 02:24:56 +0000] 1142 MainThread agent WARNING Long HB processing time: 38.8551709652
[16/Nov/2019 02:24:56 +0000] 1142 MainThread agent WARNING Delayed HB: 23s since last
[16/Nov/2019 02:25:21 +0000] 1142 MainThread agent WARNING Long HB processing time: 25.7106430531
[16/Nov/2019 02:25:21 +0000] 1142 MainThread agent WARNING Delayed HB: 10s since last
[16/Nov/2019 02:29:03 +0000] 1142 MainThread agent WARNING Long HB processing time: 169.729107141
[16/Nov/2019 02:29:03 +0000] 1142 MainThread agent WARNING Delayed HB: 154s since last
[16/Nov/2019 02:33:56 +0000] 1142 MainThread agent WARNING Long HB processing time: 292.624284029
[16/Nov/2019 02:33:56 +0000] 1142 MainThread agent WARNING Delayed HB: 277s since last

 

I am going to check the server cpu load and other things, it seems nothing happened. then I restart agent, but it will happen hours later or one day later , so my questions is:

1) how this error happened ?

 

thanks

 

 

 

1 ACCEPTED SOLUTION

avatar
Guru

Hi @iamfromsky ,

 

Thanks for keeping us posted about the issue. 

 

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

View solution in original post

17 REPLIES 17

avatar
Guru

Hi @iamfromsky ,

 

Could you please share your CDH/CM version?

As you mentioned, I would definitely check the resources (CPU, memory, free spaces, /var/log/messages, network etc) on that host.

Also, check service monitor host and service monitor logs and see if that reveals anything. 

 

Thanks,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Expert Contributor

CDH6.2.1,i think also should check host monitor log,but find nothing。

 

 

 

 

avatar
Expert Contributor

there are 5 machines happened  this issue recently, agent took long time to contact monitor server. and it's fine after I restart agent.  one thing I can sure resource usage is low, so I think there are other reasons .

the below is healthy graph history , every time this issue happened  I was going to check the monitor logs, host monitor, service monitor, but find nothing.

then, I am going to using Cloudera Manager network performance tools to check , it's failed on these server.  but why failed? 

could you give me some advises how to find root cause this kind issue ? is it because network ? 

 

 

 

1.png

 

avatar
Expert Contributor

Hi @iamfromsky ,

It does sound like it may be a connection issue.

You can run host inspector to see if CM detects any issues:

CM -> Hosts -> All Hosts -> Inspect All hosts button

 

You can also run the Network Performance Inspector, which allows you to examine the network performance among hosts and clusters managed by an instance of Cloudera Manager.

avatar
Expert Contributor

I have inspected many times in CM, it's failed if the machine happened this issue. then I run the network performance script manually, it's success.  that's very strange. 

the script is: 

/opt/cloudera/cm-agent/service/perf/host_perf_diag.py "input.json" "/tmp/result.json"

the above command is copied from CM, as we know CM will show the command when network inspecting.

so what's the diff between CM inspect and manual inspect ?

 

 

 

 

avatar
Expert Contributor

today I continue to troubleshooting this issue since there are two servers happened this issue. 

I always ask myself one question: what's difference between normal server and ab-normal server ? 

 

1) the ab-normal server agent logs show me: heartbeat took long time 40-100s, even more, but the other servers took a little time: ~10s

so why the ab-normal server took so long time to send heartbeat? 

 

ab-normal server agent cpu usage:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                           
 5702 impala    20   0 46.420g 0.012t  46348 S 355.0  9.7   7698:58 impalad                                                                                                                                                                           
 1133 root      20   0 2593756 211216   4320 S 100.7  0.2   2850:31 cmagent  
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                           
27573 impala    20   0 44.321g 0.011t  47532 S 190.0  9.2   8452:28 impalad                                                                                                                                                                           
11571 root      20   0 2591080 198476   5032 S 102.5  0.2 582:13.60 cmagent  

normal server agent cpu usage:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                           
 7584 impala    20   0 44.161g 9.492g  45824 S 142.9  7.5   8847:26 impalad                                                                                                                                                                           
 9046 bigdata+  20   0 4647812 2.628g  26896 S   3.6  2.1   2207:09 java                                                                                                                                                                              
 9509 bigdata+  20   0 4246940 1.324g  26612 S   3.6  1.1 707:37.70 java                                                                                                                                                                              
11674 root      20   0 2456080 118900   3772 S   3.6  0.1   1608:50 cmagent
 7584 impala    20   0 44.005g 0.010t  45824 S  56.4  8.3   8851:52 impalad                                                                                                                                                                           
  814 root      20   0   21644   1140    820 S   0.9  0.0   4:13.31 irqbalance                                                                                                                                                                        
 9506 bigdata+  20   0 4252672 1.283g  26604 S   0.9  1.0 706:42.33 java                                                                                                                                                                              
 9512 bigdata+  20   0 4258328 1.072g  26612 S   0.9  0.9 712:49.32 java                                                                                                                                                                              
11674 root      20   0 2456336 119008   3772 S   0.9  0.1   1608:52 cmagent 

after check many times agent cpu usage on normal server and ab-normal server I can sure one thing right now , the ab-normal server agent cpu usage always 100~, but normal server cpu usage below 10.

 

so far, what I can see the difference  between normal and abnormal  is agent cpu usage.  so I  think because agent load is too high, so the server send heartbeat to Cloudera server take long time, so this issue happened. that's why sometimes it's because heathy after I restart agent.

 

so I am going to check what's problem of this agent process:

[root@oyoshbddwdnprd4 cloudera-scm-agent]# ps -eLo pid,lwp,pcpu |grep 11571
11571 11571  0.8
11571 13216  0.1
11571 13246  0.0
11571 13247  0.0
11571 13248  0.0
11571 14022  0.0
11571 14023  0.0
11571 14025  0.0
11571 14026  0.0
11571 14027  0.0
11571 14028  0.0
11571 14029  0.0
11571 14030  0.0
11571 14031  0.0
11571 14032  0.0
11571 14033  0.0
11571 14034  0.0
11571 14213  0.0
11571 14214  0.0
11571 14215  2.4
11571 14216  0.1
11571 14355  0.1
11571 15321  0.0
11571 15322  0.0
11571 15323  0.0
11571 15324  0.0
11571 15337  0.0
11571 15338 89.2
11571 15339  0.0
11571 15347  0.0
11571 15353  0.0

 

I can see on thread cpu usage is very high, so may I consider agent cpu usage is to high is because this thread ? let me check this thread: 15338

 

 

Thread 5 (Thread 0x7fbf6f7fe700 (LWP 15338)):
#0  0x00007fbff324060e in __memcpy_ssse3_back () from /lib64/libc.so.6
#1  0x00007fbff3e6cc58 in string_concat () from /lib64/libpython2.7.so.1.0
#2  0x00007fbff3e6ea05 in PyString_Concat () from /lib64/libpython2.7.so.1.0
#3  0x00007fbff3e1d6b1 in string_concatenate () from /lib64/libpython2.7.so.1.0
#4  0x00007fbff3ec0729 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#5  0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#6  0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7  0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8  0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#9  0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#10 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#12 0x00007fbff3e4c978 in function_call () from /lib64/libpython2.7.so.1.0
#13 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#14 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#15 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#16 0x00007fbff3e7ee27 in slot_tp_call () from /lib64/libpython2.7.so.1.0
#17 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#18 0x00007fbff3ebc236 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#19 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#20 0x00007fbff3e4c978 in function_call () from /lib64/libpython2.7.so.1.0
#21 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#22 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#23 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#24 0x00007fbff3e7ee27 in slot_tp_call () from /lib64/libpython2.7.so.1.0
#25 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#26 0x00007fbff3ebc236 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#27 0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#28 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#29 0x00007fbff3e4ca6d in function_call () from /lib64/libpython2.7.so.1.0
#30 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#31 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#32 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#33 0x00007fbff3e7ea87 in slot_tp_init () from /lib64/libpython2.7.so.1.0
#34 0x00007fbff3e7d79f in type_call () from /lib64/libpython2.7.so.1.0
#35 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#36 0x00007fbff3ebc236 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#37 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#38 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#39 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#40 0x00007fbff3e4c978 in function_call () from /lib64/libpython2.7.so.1.0
#41 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#42 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#43 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#44 0x00007fbff3ebc236 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#45 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#46 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#47 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#48 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#49 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#50 0x00007fbff3e4ca6d in function_call () from /lib64/libpython2.7.so.1.0
#51 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#52 0x00007fbff3ebb6fd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#53 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#54 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#55 0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#56 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#57 0x00007fbff3e4c978 in function_call () from /lib64/libpython2.7.so.1.0
#58 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#59 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#60 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#61 0x00007fbff3eb98f7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#62 0x00007fbff3ef1822 in t_bootstrap () from /lib64/libpython2.7.so.1.0
#63 0x00007fbff3bc7e25 in start_thread () from /lib64/libpthread.so.0
#64 0x00007fbff31ec34d in clone () from /lib64/libc.so.6

 

so could you tell me what's this thread doing? if I now the thread is doing maybe I can find the root cause, thanks.

 

avatar
Expert Contributor
04/Dec/2019 01:14:54 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 12.050262928
[04/Dec/2019 01:15:03 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 6.31600308418
[04/Dec/2019 01:15:18 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 5.97873497009
[04/Dec/2019 01:15:34 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 7.34343314171
[04/Dec/2019 01:15:54 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 12.1208558083
[04/Dec/2019 01:16:14 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 16.9043488503
[04/Dec/2019 01:16:26 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 12.2732441425
[04/Dec/2019 01:16:47 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 18.1826350689
[04/Dec/2019 01:16:47 +0000] 28842 MainThread agent        WARNING  Delayed HB: 3s since last
[04/Dec/2019 01:16:49 +0000] 28842 MonitorDaemon-Reporter throttling_logger WARNING  Failed to get swap memory usage for process 13345: list index out of range
[04/Dec/2019 01:18:02 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 15.3277680874
[04/Dec/2019 01:18:27 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 9.59771895409
[04/Dec/2019 01:18:47 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 14.4301588535
[04/Dec/2019 01:18:53 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 5.77177095413
[04/Dec/2019 01:19:17 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 14.8392968178
[04/Dec/2019 01:19:33 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 15.5372338295
[04/Dec/2019 01:20:50 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 46.7583889961
[04/Dec/2019 01:20:50 +0000] 28842 MainThread agent        WARNING  Delayed HB: 31s since last
[04/Dec/2019 01:22:24 +0000] 28842 MainThread heartbeat_tracker INFO     HB stats (seconds): num:31 LIFE_MIN:0.00 min:0.02 mean:0.28 max:2.14 LIFE_MAX:4.53
[04/Dec/2019 01:22:25 +0000] 28842 MainThread throttling_logger INFO     (12 skipped) Identified java component java8 with full version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)  for requested version 8.
[04/Dec/2019 01:22:26 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 95.7612428665
[04/Dec/2019 01:22:26 +0000] 28842 MainThread agent        WARNING  Delayed HB: 80s since last
[04/Dec/2019 01:22:32 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 6.20306706429
[04/Dec/2019 01:24:18 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 97.3476579189
[04/Dec/2019 01:24:18 +0000] 28842 MainThread agent        WARNING  Delayed HB: 82s since last
[04/Dec/2019 01:25:32 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 73.8358900547
[04/Dec/2019 01:25:32 +0000] 28842 MainThread agent        WARNING  Delayed HB: 58s since last
[04/Dec/2019 01:26:21 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 48.3692760468
[04/Dec/2019 01:26:21 +0000] 28842 MainThread agent        WARNING  Delayed HB: 33s since last
[04/Dec/2019 01:28:50 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 149.57425499
[04/Dec/2019 01:28:50 +0000] 28842 MainThread agent        WARNING  Delayed HB: 134s since last
[04/Dec/2019 01:30:42 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 111.463696957
[04/Dec/2019 01:30:42 +0000] 28842 MainThread agent        WARNING  Delayed HB: 96s since last
[04/Dec/2019 01:30:47 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 5.08097100258
[04/Dec/2019 01:32:58 +0000] 28842 MainThread heartbeat_tracker INFO     HB stats (seconds): num:8 LIFE_MIN:0.00 min:0.11 mean:0.71 max:1.96 LIFE_MAX:4.53
[04/Dec/2019 01:33:00 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 122.480878115
[04/Dec/2019 01:33:00 +0000] 28842 MainThread agent        WARNING  Delayed HB: 107s since last
[04/Dec/2019 01:34:10 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 69.9775719643
[04/Dec/2019 01:34:10 +0000] 28842 MainThread agent        WARNING  Delayed HB: 55s since last
[04/Dec/2019 01:35:45 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 95.0239050388
[04/Dec/2019 01:35:45 +0000] 28842 MainThread agent        WARNING  Delayed HB: 80s since last
[04/Dec/2019 01:37:42 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 117.279044151
[04/Dec/2019 01:37:42 +0000] 28842 MainThread agent        WARNING  Delayed HB: 102s since last
[04/Dec/2019 01:40:33 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 170.700104952
[04/Dec/2019 01:40:33 +0000] 28842 MainThread agent        WARNING  Delayed HB: 155s since last
[04/Dec/2019 01:40:43 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 10.2622208595
[04/Dec/2019 01:41:17 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 28.4504780769
[04/Dec/2019 01:41:17 +0000] 28842 MainThread agent        WARNING  Delayed HB: 13s since last
[04/Dec/2019 01:43:17 +0000] 28842 MainThread heartbeat_tracker INFO     HB stats (seconds): num:7 LIFE_MIN:0.00 min:0.02 mean:0.94 max:1.48 LIFE_MAX:4.53
[04/Dec/2019 01:43:18 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 121.649900913
[04/Dec/2019 01:43:18 +0000] 28842 MainThread agent        WARNING  Delayed HB: 106s since last
[04/Dec/2019 01:44:11 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 52.2356939316
[04/Dec/2019 01:44:11 +0000] 28842 MainThread agent        WARNING  Delayed HB: 37s since last
[04/Dec/2019 01:44:28 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 17.3855171204
[04/Dec/2019 01:47:23 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 175.126209021
[04/Dec/2019 01:47:24 +0000] 28842 MainThread agent        WARNING  Delayed HB: 160s since last
[04/Dec/2019 01:48:27 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 63.2303080559
[04/Dec/2019 01:48:27 +0000] 28842 MainThread agent        WARNING  Delayed HB: 48s since last
[04/Dec/2019 01:49:00 +0000] 28842 MonitorDaemon-Reporter throttling_logger ERROR    (22 skipped) Role_extractor for STATESTORE is not initialized correctly.
None
[04/Dec/2019 01:49:12 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 45.4053838253
[04/Dec/2019 01:49:13 +0000] 28842 MainThread agent        WARNING  Delayed HB: 31s since last
[04/Dec/2019 01:51:19 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 125.514182091
[04/Dec/2019 01:51:19 +0000] 28842 MainThread agent        WARNING  Delayed HB: 110s since last
[04/Dec/2019 01:52:38 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 78.3114128113
[04/Dec/2019 01:52:38 +0000] 28842 MainThread agent        WARNING  Delayed HB: 63s since last
[04/Dec/2019 01:52:45 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 6.34388899803
[04/Dec/2019 01:53:03 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 9.70307207108
[04/Dec/2019 01:53:30 +0000] 28842 MainThread heartbeat_tracker INFO     HB stats (seconds): num:10 LIFE_MIN:0.00 min:0.06 mean:0.58 max:1.79 LIFE_MAX:4.53
[04/Dec/2019 01:53:30 +0000] 28842 MainThread throttling_logger INFO     (8 skipped) Identified java component java8 with full version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)  for requested version 8.
[04/Dec/2019 01:53:30 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 21.8870470524
[04/Dec/2019 01:53:30 +0000] 28842 MainThread agent        WARNING  Delayed HB: 6s since last
[04/Dec/2019 01:53:46 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 16.0326740742
[04/Dec/2019 01:54:08 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 22.1611609459
[04/Dec/2019 01:54:08 +0000] 28842 MainThread agent        WARNING  Delayed HB: 7s since last
[04/Dec/2019 01:54:14 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 6.01884293556
[04/Dec/2019 01:54:34 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 10.3302299976
[04/Dec/2019 01:54:45 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 6.26238012314
[04/Dec/2019 01:54:59 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 5.61235713959
[04/Dec/2019 01:55:48 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 39.5674409866
[04/Dec/2019 01:55:48 +0000] 28842 MainThread agent        WARNING  Delayed HB: 24s since last
[04/Dec/2019 01:56:04 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 15.7047331333
[04/Dec/2019 01:56:45 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 26.3552570343
[04/Dec/2019 01:56:45 +0000] 28842 MainThread agent        WARNING  Delayed HB: 11s since last
[04/Dec/2019 01:58:15 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 44.8799829483
[04/Dec/2019 01:58:15 +0000] 28842 MainThread agent        WARNING  Delayed HB: 29s since last
[04/Dec/2019 01:59:02 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 47.1872057915
[04/Dec/2019 01:59:02 +0000] 28842 MainThread agent        WARNING  Delayed HB: 32s since last
[04/Dec/2019 01:59:17 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 14.3799068928
[04/Dec/2019 02:00:27 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 54.5242679119
[04/Dec/2019 02:00:27 +0000] 28842 MainThread agent        WARNING  Delayed HB: 39s since last
[04/Dec/2019 02:01:07 +0000] 28842 MonitorDaemon-Reporter throttling_logger INFO     (10 skipped) Descendants user CPU lower than expected for process 7167:           42154365.0, 42134965.36
[04/Dec/2019 02:02:45 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 137.669925213
[04/Dec/2019 02:02:45 +0000] 28842 MainThread agent        WARNING  Delayed HB: 122s since last
[04/Dec/2019 02:04:41 +0000] 28842 MainThread heartbeat_tracker INFO     HB stats (seconds): num:20 LIFE_MIN:0.00 min:0.02 mean:0.52 max:2.48 LIFE_MAX:4.53
[04/Dec/2019 02:04:42 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 117.50967598
[04/Dec/2019 02:04:42 +0000] 28842 MainThread agent        WARNING  Delayed HB: 102s since last
[04/Dec/2019 02:07:01 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 138.424443007
[04/Dec/2019 02:07:01 +0000] 28842 MainThread agent        WARNING  Delayed HB: 123s since last
[04/Dec/2019 02:09:43 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 162.216438055
[04/Dec/2019 02:09:44 +0000] 28842 MainThread agent        WARNING  Delayed HB: 147s since last
[04/Dec/2019 02:11:34 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 109.956519842
[04/Dec/2019 02:11:34 +0000] 28842 MainThread agent        WARNING  Delayed HB: 95s since last
[04/Dec/2019 02:12:10 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 36.4911990166
[04/Dec/2019 02:12:11 +0000] 28842 MainThread agent        WARNING  Delayed HB: 21s since last
[04/Dec/2019 02:12:17 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 6.5263209343
[04/Dec/2019 02:13:37 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 71.4897689819
[04/Dec/2019 02:13:37 +0000] 28842 MainThread agent        WARNING  Delayed HB: 56s since last
[04/Dec/2019 02:15:11 +0000] 28842 MainThread heartbeat_tracker INFO     HB stats (seconds): num:7 LIFE_MIN:0.00 min:0.20 mean:1.26 max:1.96 LIFE_MAX:4.53
[04/Dec/2019 02:15:13 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 95.5481140614
[04/Dec/2019 02:15:13 +0000] 28842 MainThread agent        WARNING  Delayed HB: 80s since last
[04/Dec/2019 02:17:41 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 147.541055918
[04/Dec/2019 02:17:41 +0000] 28842 MainThread agent        WARNING  Delayed HB: 132s since last
[04/Dec/2019 02:17:52 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 11.1138081551
[04/Dec/2019 02:18:09 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 13.3887140751
[04/Dec/2019 02:21:01 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 170.292068958
[04/Dec/2019 02:21:01 +0000] 28842 MainThread agent        WARNING  Delayed HB: 155s since last
[04/Dec/2019 02:21:11 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 9.28128290176
[04/Dec/2019 02:26:27 +0000] 28842 MainThread agent        ERROR    Heartbeating to 10.203.3.97:7182 failed.
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1396, in _send_heartbeat
    response = self.requestor.request('heartbeat', heartbeat_data)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
    return self.issue_request(call_request, message_name, request_datum)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
    call_response = self.transceiver.transceive(call_request)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
    result = self.read_framed_message()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 487, in read_framed_message
    response = self.conn.getresponse()
  File "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
    response.begin()
  File "/usr/lib64/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''
[04/Dec/2019 02:26:27 +0000] 28842 MainThread heartbeat_tracker INFO     HB stats (seconds): num:6 LIFE_MIN:0.00 min:0.02 mean:0.62 max:1.55 LIFE_MAX:4.53
[04/Dec/2019 02:26:27 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 310.67294383
[04/Dec/2019 02:26:28 +0000] 28842 MainThread agent        WARNING  Delayed HB: 295s since last
[04/Dec/2019 02:30:28 +0000] 28842 MainThread throttling_logger INFO     (10 skipped) Identified java component java8 with full version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)  for requested version 8.
[04/Dec/2019 02:30:29 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 241.416838169
[04/Dec/2019 02:30:29 +0000] 28842 MainThread agent        WARNING  Delayed HB: 226s since last
[04/Dec/2019 02:37:11 +0000] 28842 MainThread agent        ERROR    Heartbeating to 10.203.3.97:7182 failed.
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1396, in _send_heartbeat
    response = self.requestor.request('heartbeat', heartbeat_data)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
    return self.issue_request(call_request, message_name, request_datum)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
    call_response = self.transceiver.transceive(call_request)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
    result = self.read_framed_message()
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 487, in read_framed_message
    response = self.conn.getresponse()
  File "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
    response.begin()
  File "/usr/lib64/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''
[04/Dec/2019 02:37:11 +0000] 28842 MainThread heartbeat_tracker INFO     HB stats (seconds): num:2 LIFE_MIN:0.00 min:0.44 mean:0.48 max:0.53 LIFE_MAX:4.53
[04/Dec/2019 02:37:11 +0000] 28842 MainThread agent        WARNING  Long HB processing time: 402.084249973
[04/Dec/2019 02:37:11 +0000] 28842 MainThread agent        WARNING  Delayed HB: 387s since last

 

could you get useful information from the above logs ?  

 

I can see the logs show me heartbeat failed, but I don't know why ? could you give me some advises? 

 

thank you very much .

avatar
Expert Contributor

OS version is : CentOS 7.4 , it doesn't have swap space, is this a matter ? the below log I am not sure it has connection to swap, if Cloudera manager require swap, I can try to add swap space. 

 

[04/Dec/2019 01:16:49 +0000] 28842 MonitorDaemon-Reporter throttling_logger WARNING  Failed to get swap memory usage for process 13345: list index out of range

 

avatar
Guru

Hi @iamfromsky ,

 

Thanks for the detailed information with stack trace. I did some research and think this may fit an old bug we had in the agent code which is triggering this memcpy. Ultimately the problem is due to one log file which happens to have some insanely long log lines.

Some questions for you:

  1. What services are installed on this host?
  2. Have you turned on any debugging on any service?

Suggestion for you to see if you are hit by this issue:

  1. On the problematic host, stop CM agent
  2. Clear out all the log files under /var/log directory which are managed by CM
  3. Start CM agent
  4. Then check if the issue is solved

Note: Before you clear the logs out, please make a backup in case you need to do a RCA to see which service is causing the long line.

 

If the issue still exists after doing above, then you should try to collect below information for further analysis:

1) collect top/pstack for the agent process
2) Run "SIGQUIT" on the agent PID,
e.g.# kill -SIGQUIT <pid>
3) Provide the agent logs.
4) Provide the output of the command "netstat -anp"

 

Thanks and hope this helps,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum