Created on 11-15-2019 11:09 AM - last edited on 11-15-2019 01:09 PM by cjervis
my CDH Cluster has been online more than half year, all the machines and service is normal.
but recently , there is one host agent always under bad healthy, and Cloudera Manager will show the red color on this host.
then I check the agent log as below:
[16/Nov/2019 02:20:11 +0000] 1142 MainThread agent WARNING Long HB processing time: 9.52771306038
[16/Nov/2019 02:20:52 +0000] 1142 MainThread agent WARNING Long HB processing time: 34.8044610023
[16/Nov/2019 02:20:52 +0000] 1142 MainThread agent WARNING Delayed HB: 19s since last
[16/Nov/2019 02:21:14 +0000] 1142 MainThread agent WARNING Long HB processing time: 22.3401939869
[16/Nov/2019 02:21:14 +0000] 1142 MainThread agent WARNING Delayed HB: 7s since last
[16/Nov/2019 02:21:51 +0000] 1142 MainThread agent WARNING Long HB processing time: 36.7459609509
[16/Nov/2019 02:21:51 +0000] 1142 MainThread agent WARNING Delayed HB: 21s since last
[16/Nov/2019 02:22:24 +0000] 1142 MainThread agent WARNING Long HB processing time: 33.0430369377
[16/Nov/2019 02:22:24 +0000] 1142 MainThread agent WARNING Delayed HB: 18s since last
[16/Nov/2019 02:23:04 +0000] 1142 MainThread agent WARNING Long HB processing time: 40.0112349987
[16/Nov/2019 02:23:04 +0000] 1142 MainThread agent WARNING Delayed HB: 25s since last
[16/Nov/2019 02:23:35 +0000] 1142 MainThread agent WARNING Long HB processing time: 30.6218628883
[16/Nov/2019 02:23:35 +0000] 1142 MainThread agent WARNING Delayed HB: 15s since last
[16/Nov/2019 02:24:17 +0000] 1142 MainThread agent WARNING Long HB processing time: 42.0491158962
[16/Nov/2019 02:24:17 +0000] 1142 MainThread agent WARNING Delayed HB: 27s since last
[16/Nov/2019 02:24:55 +0000] 1142 MainThread heartbeat_tracker INFO HB stats (seconds): num:17 LIFE_MIN:0.01 min:0.02 mean:0.18 max:0.53 LIFE_MAX:1.67
[16/Nov/2019 02:24:56 +0000] 1142 MainThread agent WARNING Long HB processing time: 38.8551709652
[16/Nov/2019 02:24:56 +0000] 1142 MainThread agent WARNING Delayed HB: 23s since last
[16/Nov/2019 02:25:21 +0000] 1142 MainThread agent WARNING Long HB processing time: 25.7106430531
[16/Nov/2019 02:25:21 +0000] 1142 MainThread agent WARNING Delayed HB: 10s since last
[16/Nov/2019 02:29:03 +0000] 1142 MainThread agent WARNING Long HB processing time: 169.729107141
[16/Nov/2019 02:29:03 +0000] 1142 MainThread agent WARNING Delayed HB: 154s since last
[16/Nov/2019 02:33:56 +0000] 1142 MainThread agent WARNING Long HB processing time: 292.624284029
[16/Nov/2019 02:33:56 +0000] 1142 MainThread agent WARNING Delayed HB: 277s since last
I am going to check the server cpu load and other things, it seems nothing happened. then I restart agent, but it will happen hours later or one day later , so my questions is:
1) how this error happened ?
thanks
Created 12-10-2019 08:37 AM
Hi @iamfromsky ,
Thanks for keeping us posted about the issue.
Li Wang, Technical Solution Manager
Created 11-15-2019 01:53 PM
Hi @iamfromsky ,
Could you please share your CDH/CM version?
As you mentioned, I would definitely check the resources (CPU, memory, free spaces, /var/log/messages, network etc) on that host.
Also, check service monitor host and service monitor logs and see if that reveals anything.
Thanks,
Li
Li Wang, Technical Solution Manager
Created 11-15-2019 07:03 PM
CDH6.2.1,i think also should check host monitor log,but find nothing。
Created on 11-21-2019 09:34 PM - edited 11-21-2019 09:48 PM
there are 5 machines happened this issue recently, agent took long time to contact monitor server. and it's fine after I restart agent. one thing I can sure resource usage is low, so I think there are other reasons .
the below is healthy graph history , every time this issue happened I was going to check the monitor logs, host monitor, service monitor, but find nothing.
then, I am going to using Cloudera Manager network performance tools to check , it's failed on these server. but why failed?
could you give me some advises how to find root cause this kind issue ? is it because network ?
Hi @iamfromsky ,
It does sound like it may be a connection issue.
You can run host inspector to see if CM detects any issues:
CM -> Hosts -> All Hosts -> Inspect All hosts button
You can also run the Network Performance Inspector, which allows you to examine the network performance among hosts and clusters managed by an instance of Cloudera Manager.
Created 11-22-2019 09:38 PM
I have inspected many times in CM, it's failed if the machine happened this issue. then I run the network performance script manually, it's success. that's very strange.
the script is:
/opt/cloudera/cm-agent/service/perf/host_perf_diag.py "input.json" "/tmp/result.json"
the above command is copied from CM, as we know CM will show the command when network inspecting.
so what's the diff between CM inspect and manual inspect ?
Created 11-23-2019 10:59 PM
today I continue to troubleshooting this issue since there are two servers happened this issue.
I always ask myself one question: what's difference between normal server and ab-normal server ?
1) the ab-normal server agent logs show me: heartbeat took long time 40-100s, even more, but the other servers took a little time: ~10s
so why the ab-normal server took so long time to send heartbeat?
ab-normal server agent cpu usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5702 impala 20 0 46.420g 0.012t 46348 S 355.0 9.7 7698:58 impalad
1133 root 20 0 2593756 211216 4320 S 100.7 0.2 2850:31 cmagent
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27573 impala 20 0 44.321g 0.011t 47532 S 190.0 9.2 8452:28 impalad
11571 root 20 0 2591080 198476 5032 S 102.5 0.2 582:13.60 cmagent
normal server agent cpu usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7584 impala 20 0 44.161g 9.492g 45824 S 142.9 7.5 8847:26 impalad
9046 bigdata+ 20 0 4647812 2.628g 26896 S 3.6 2.1 2207:09 java
9509 bigdata+ 20 0 4246940 1.324g 26612 S 3.6 1.1 707:37.70 java
11674 root 20 0 2456080 118900 3772 S 3.6 0.1 1608:50 cmagent
7584 impala 20 0 44.005g 0.010t 45824 S 56.4 8.3 8851:52 impalad
814 root 20 0 21644 1140 820 S 0.9 0.0 4:13.31 irqbalance
9506 bigdata+ 20 0 4252672 1.283g 26604 S 0.9 1.0 706:42.33 java
9512 bigdata+ 20 0 4258328 1.072g 26612 S 0.9 0.9 712:49.32 java
11674 root 20 0 2456336 119008 3772 S 0.9 0.1 1608:52 cmagent
after check many times agent cpu usage on normal server and ab-normal server I can sure one thing right now , the ab-normal server agent cpu usage always 100~, but normal server cpu usage below 10.
so far, what I can see the difference between normal and abnormal is agent cpu usage. so I think because agent load is too high, so the server send heartbeat to Cloudera server take long time, so this issue happened. that's why sometimes it's because heathy after I restart agent.
so I am going to check what's problem of this agent process:
[root@oyoshbddwdnprd4 cloudera-scm-agent]# ps -eLo pid,lwp,pcpu |grep 11571
11571 11571 0.8
11571 13216 0.1
11571 13246 0.0
11571 13247 0.0
11571 13248 0.0
11571 14022 0.0
11571 14023 0.0
11571 14025 0.0
11571 14026 0.0
11571 14027 0.0
11571 14028 0.0
11571 14029 0.0
11571 14030 0.0
11571 14031 0.0
11571 14032 0.0
11571 14033 0.0
11571 14034 0.0
11571 14213 0.0
11571 14214 0.0
11571 14215 2.4
11571 14216 0.1
11571 14355 0.1
11571 15321 0.0
11571 15322 0.0
11571 15323 0.0
11571 15324 0.0
11571 15337 0.0
11571 15338 89.2
11571 15339 0.0
11571 15347 0.0
11571 15353 0.0
I can see on thread cpu usage is very high, so may I consider agent cpu usage is to high is because this thread ? let me check this thread: 15338
Thread 5 (Thread 0x7fbf6f7fe700 (LWP 15338)):
#0 0x00007fbff324060e in __memcpy_ssse3_back () from /lib64/libc.so.6
#1 0x00007fbff3e6cc58 in string_concat () from /lib64/libpython2.7.so.1.0
#2 0x00007fbff3e6ea05 in PyString_Concat () from /lib64/libpython2.7.so.1.0
#3 0x00007fbff3e1d6b1 in string_concatenate () from /lib64/libpython2.7.so.1.0
#4 0x00007fbff3ec0729 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#5 0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#6 0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7 0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#8 0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#9 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#10 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#12 0x00007fbff3e4c978 in function_call () from /lib64/libpython2.7.so.1.0
#13 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#14 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#15 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#16 0x00007fbff3e7ee27 in slot_tp_call () from /lib64/libpython2.7.so.1.0
#17 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#18 0x00007fbff3ebc236 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#19 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#20 0x00007fbff3e4c978 in function_call () from /lib64/libpython2.7.so.1.0
#21 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#22 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#23 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#24 0x00007fbff3e7ee27 in slot_tp_call () from /lib64/libpython2.7.so.1.0
#25 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#26 0x00007fbff3ebc236 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#27 0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#28 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#29 0x00007fbff3e4ca6d in function_call () from /lib64/libpython2.7.so.1.0
#30 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#31 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#32 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#33 0x00007fbff3e7ea87 in slot_tp_init () from /lib64/libpython2.7.so.1.0
#34 0x00007fbff3e7d79f in type_call () from /lib64/libpython2.7.so.1.0
#35 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#36 0x00007fbff3ebc236 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#37 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#38 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#39 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#40 0x00007fbff3e4c978 in function_call () from /lib64/libpython2.7.so.1.0
#41 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#42 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#43 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#44 0x00007fbff3ebc236 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#45 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#46 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#47 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#48 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#49 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#50 0x00007fbff3e4ca6d in function_call () from /lib64/libpython2.7.so.1.0
#51 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#52 0x00007fbff3ebb6fd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#53 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#54 0x00007fbff3ec053c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#55 0x00007fbff3ec06bd in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#56 0x00007fbff3ec303d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#57 0x00007fbff3e4c978 in function_call () from /lib64/libpython2.7.so.1.0
#58 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#59 0x00007fbff3e36a55 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#60 0x00007fbff3e27a63 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#61 0x00007fbff3eb98f7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#62 0x00007fbff3ef1822 in t_bootstrap () from /lib64/libpython2.7.so.1.0
#63 0x00007fbff3bc7e25 in start_thread () from /lib64/libpthread.so.0
#64 0x00007fbff31ec34d in clone () from /lib64/libc.so.6
so could you tell me what's this thread doing? if I now the thread is doing maybe I can find the root cause, thanks.
Created 12-03-2019 10:42 AM
04/Dec/2019 01:14:54 +0000] 28842 MainThread agent WARNING Long HB processing time: 12.050262928
[04/Dec/2019 01:15:03 +0000] 28842 MainThread agent WARNING Long HB processing time: 6.31600308418
[04/Dec/2019 01:15:18 +0000] 28842 MainThread agent WARNING Long HB processing time: 5.97873497009
[04/Dec/2019 01:15:34 +0000] 28842 MainThread agent WARNING Long HB processing time: 7.34343314171
[04/Dec/2019 01:15:54 +0000] 28842 MainThread agent WARNING Long HB processing time: 12.1208558083
[04/Dec/2019 01:16:14 +0000] 28842 MainThread agent WARNING Long HB processing time: 16.9043488503
[04/Dec/2019 01:16:26 +0000] 28842 MainThread agent WARNING Long HB processing time: 12.2732441425
[04/Dec/2019 01:16:47 +0000] 28842 MainThread agent WARNING Long HB processing time: 18.1826350689
[04/Dec/2019 01:16:47 +0000] 28842 MainThread agent WARNING Delayed HB: 3s since last
[04/Dec/2019 01:16:49 +0000] 28842 MonitorDaemon-Reporter throttling_logger WARNING Failed to get swap memory usage for process 13345: list index out of range
[04/Dec/2019 01:18:02 +0000] 28842 MainThread agent WARNING Long HB processing time: 15.3277680874
[04/Dec/2019 01:18:27 +0000] 28842 MainThread agent WARNING Long HB processing time: 9.59771895409
[04/Dec/2019 01:18:47 +0000] 28842 MainThread agent WARNING Long HB processing time: 14.4301588535
[04/Dec/2019 01:18:53 +0000] 28842 MainThread agent WARNING Long HB processing time: 5.77177095413
[04/Dec/2019 01:19:17 +0000] 28842 MainThread agent WARNING Long HB processing time: 14.8392968178
[04/Dec/2019 01:19:33 +0000] 28842 MainThread agent WARNING Long HB processing time: 15.5372338295
[04/Dec/2019 01:20:50 +0000] 28842 MainThread agent WARNING Long HB processing time: 46.7583889961
[04/Dec/2019 01:20:50 +0000] 28842 MainThread agent WARNING Delayed HB: 31s since last
[04/Dec/2019 01:22:24 +0000] 28842 MainThread heartbeat_tracker INFO HB stats (seconds): num:31 LIFE_MIN:0.00 min:0.02 mean:0.28 max:2.14 LIFE_MAX:4.53
[04/Dec/2019 01:22:25 +0000] 28842 MainThread throttling_logger INFO (12 skipped) Identified java component java8 with full version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) for requested version 8.
[04/Dec/2019 01:22:26 +0000] 28842 MainThread agent WARNING Long HB processing time: 95.7612428665
[04/Dec/2019 01:22:26 +0000] 28842 MainThread agent WARNING Delayed HB: 80s since last
[04/Dec/2019 01:22:32 +0000] 28842 MainThread agent WARNING Long HB processing time: 6.20306706429
[04/Dec/2019 01:24:18 +0000] 28842 MainThread agent WARNING Long HB processing time: 97.3476579189
[04/Dec/2019 01:24:18 +0000] 28842 MainThread agent WARNING Delayed HB: 82s since last
[04/Dec/2019 01:25:32 +0000] 28842 MainThread agent WARNING Long HB processing time: 73.8358900547
[04/Dec/2019 01:25:32 +0000] 28842 MainThread agent WARNING Delayed HB: 58s since last
[04/Dec/2019 01:26:21 +0000] 28842 MainThread agent WARNING Long HB processing time: 48.3692760468
[04/Dec/2019 01:26:21 +0000] 28842 MainThread agent WARNING Delayed HB: 33s since last
[04/Dec/2019 01:28:50 +0000] 28842 MainThread agent WARNING Long HB processing time: 149.57425499
[04/Dec/2019 01:28:50 +0000] 28842 MainThread agent WARNING Delayed HB: 134s since last
[04/Dec/2019 01:30:42 +0000] 28842 MainThread agent WARNING Long HB processing time: 111.463696957
[04/Dec/2019 01:30:42 +0000] 28842 MainThread agent WARNING Delayed HB: 96s since last
[04/Dec/2019 01:30:47 +0000] 28842 MainThread agent WARNING Long HB processing time: 5.08097100258
[04/Dec/2019 01:32:58 +0000] 28842 MainThread heartbeat_tracker INFO HB stats (seconds): num:8 LIFE_MIN:0.00 min:0.11 mean:0.71 max:1.96 LIFE_MAX:4.53
[04/Dec/2019 01:33:00 +0000] 28842 MainThread agent WARNING Long HB processing time: 122.480878115
[04/Dec/2019 01:33:00 +0000] 28842 MainThread agent WARNING Delayed HB: 107s since last
[04/Dec/2019 01:34:10 +0000] 28842 MainThread agent WARNING Long HB processing time: 69.9775719643
[04/Dec/2019 01:34:10 +0000] 28842 MainThread agent WARNING Delayed HB: 55s since last
[04/Dec/2019 01:35:45 +0000] 28842 MainThread agent WARNING Long HB processing time: 95.0239050388
[04/Dec/2019 01:35:45 +0000] 28842 MainThread agent WARNING Delayed HB: 80s since last
[04/Dec/2019 01:37:42 +0000] 28842 MainThread agent WARNING Long HB processing time: 117.279044151
[04/Dec/2019 01:37:42 +0000] 28842 MainThread agent WARNING Delayed HB: 102s since last
[04/Dec/2019 01:40:33 +0000] 28842 MainThread agent WARNING Long HB processing time: 170.700104952
[04/Dec/2019 01:40:33 +0000] 28842 MainThread agent WARNING Delayed HB: 155s since last
[04/Dec/2019 01:40:43 +0000] 28842 MainThread agent WARNING Long HB processing time: 10.2622208595
[04/Dec/2019 01:41:17 +0000] 28842 MainThread agent WARNING Long HB processing time: 28.4504780769
[04/Dec/2019 01:41:17 +0000] 28842 MainThread agent WARNING Delayed HB: 13s since last
[04/Dec/2019 01:43:17 +0000] 28842 MainThread heartbeat_tracker INFO HB stats (seconds): num:7 LIFE_MIN:0.00 min:0.02 mean:0.94 max:1.48 LIFE_MAX:4.53
[04/Dec/2019 01:43:18 +0000] 28842 MainThread agent WARNING Long HB processing time: 121.649900913
[04/Dec/2019 01:43:18 +0000] 28842 MainThread agent WARNING Delayed HB: 106s since last
[04/Dec/2019 01:44:11 +0000] 28842 MainThread agent WARNING Long HB processing time: 52.2356939316
[04/Dec/2019 01:44:11 +0000] 28842 MainThread agent WARNING Delayed HB: 37s since last
[04/Dec/2019 01:44:28 +0000] 28842 MainThread agent WARNING Long HB processing time: 17.3855171204
[04/Dec/2019 01:47:23 +0000] 28842 MainThread agent WARNING Long HB processing time: 175.126209021
[04/Dec/2019 01:47:24 +0000] 28842 MainThread agent WARNING Delayed HB: 160s since last
[04/Dec/2019 01:48:27 +0000] 28842 MainThread agent WARNING Long HB processing time: 63.2303080559
[04/Dec/2019 01:48:27 +0000] 28842 MainThread agent WARNING Delayed HB: 48s since last
[04/Dec/2019 01:49:00 +0000] 28842 MonitorDaemon-Reporter throttling_logger ERROR (22 skipped) Role_extractor for STATESTORE is not initialized correctly.
None
[04/Dec/2019 01:49:12 +0000] 28842 MainThread agent WARNING Long HB processing time: 45.4053838253
[04/Dec/2019 01:49:13 +0000] 28842 MainThread agent WARNING Delayed HB: 31s since last
[04/Dec/2019 01:51:19 +0000] 28842 MainThread agent WARNING Long HB processing time: 125.514182091
[04/Dec/2019 01:51:19 +0000] 28842 MainThread agent WARNING Delayed HB: 110s since last
[04/Dec/2019 01:52:38 +0000] 28842 MainThread agent WARNING Long HB processing time: 78.3114128113
[04/Dec/2019 01:52:38 +0000] 28842 MainThread agent WARNING Delayed HB: 63s since last
[04/Dec/2019 01:52:45 +0000] 28842 MainThread agent WARNING Long HB processing time: 6.34388899803
[04/Dec/2019 01:53:03 +0000] 28842 MainThread agent WARNING Long HB processing time: 9.70307207108
[04/Dec/2019 01:53:30 +0000] 28842 MainThread heartbeat_tracker INFO HB stats (seconds): num:10 LIFE_MIN:0.00 min:0.06 mean:0.58 max:1.79 LIFE_MAX:4.53
[04/Dec/2019 01:53:30 +0000] 28842 MainThread throttling_logger INFO (8 skipped) Identified java component java8 with full version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) for requested version 8.
[04/Dec/2019 01:53:30 +0000] 28842 MainThread agent WARNING Long HB processing time: 21.8870470524
[04/Dec/2019 01:53:30 +0000] 28842 MainThread agent WARNING Delayed HB: 6s since last
[04/Dec/2019 01:53:46 +0000] 28842 MainThread agent WARNING Long HB processing time: 16.0326740742
[04/Dec/2019 01:54:08 +0000] 28842 MainThread agent WARNING Long HB processing time: 22.1611609459
[04/Dec/2019 01:54:08 +0000] 28842 MainThread agent WARNING Delayed HB: 7s since last
[04/Dec/2019 01:54:14 +0000] 28842 MainThread agent WARNING Long HB processing time: 6.01884293556
[04/Dec/2019 01:54:34 +0000] 28842 MainThread agent WARNING Long HB processing time: 10.3302299976
[04/Dec/2019 01:54:45 +0000] 28842 MainThread agent WARNING Long HB processing time: 6.26238012314
[04/Dec/2019 01:54:59 +0000] 28842 MainThread agent WARNING Long HB processing time: 5.61235713959
[04/Dec/2019 01:55:48 +0000] 28842 MainThread agent WARNING Long HB processing time: 39.5674409866
[04/Dec/2019 01:55:48 +0000] 28842 MainThread agent WARNING Delayed HB: 24s since last
[04/Dec/2019 01:56:04 +0000] 28842 MainThread agent WARNING Long HB processing time: 15.7047331333
[04/Dec/2019 01:56:45 +0000] 28842 MainThread agent WARNING Long HB processing time: 26.3552570343
[04/Dec/2019 01:56:45 +0000] 28842 MainThread agent WARNING Delayed HB: 11s since last
[04/Dec/2019 01:58:15 +0000] 28842 MainThread agent WARNING Long HB processing time: 44.8799829483
[04/Dec/2019 01:58:15 +0000] 28842 MainThread agent WARNING Delayed HB: 29s since last
[04/Dec/2019 01:59:02 +0000] 28842 MainThread agent WARNING Long HB processing time: 47.1872057915
[04/Dec/2019 01:59:02 +0000] 28842 MainThread agent WARNING Delayed HB: 32s since last
[04/Dec/2019 01:59:17 +0000] 28842 MainThread agent WARNING Long HB processing time: 14.3799068928
[04/Dec/2019 02:00:27 +0000] 28842 MainThread agent WARNING Long HB processing time: 54.5242679119
[04/Dec/2019 02:00:27 +0000] 28842 MainThread agent WARNING Delayed HB: 39s since last
[04/Dec/2019 02:01:07 +0000] 28842 MonitorDaemon-Reporter throttling_logger INFO (10 skipped) Descendants user CPU lower than expected for process 7167: 42154365.0, 42134965.36
[04/Dec/2019 02:02:45 +0000] 28842 MainThread agent WARNING Long HB processing time: 137.669925213
[04/Dec/2019 02:02:45 +0000] 28842 MainThread agent WARNING Delayed HB: 122s since last
[04/Dec/2019 02:04:41 +0000] 28842 MainThread heartbeat_tracker INFO HB stats (seconds): num:20 LIFE_MIN:0.00 min:0.02 mean:0.52 max:2.48 LIFE_MAX:4.53
[04/Dec/2019 02:04:42 +0000] 28842 MainThread agent WARNING Long HB processing time: 117.50967598
[04/Dec/2019 02:04:42 +0000] 28842 MainThread agent WARNING Delayed HB: 102s since last
[04/Dec/2019 02:07:01 +0000] 28842 MainThread agent WARNING Long HB processing time: 138.424443007
[04/Dec/2019 02:07:01 +0000] 28842 MainThread agent WARNING Delayed HB: 123s since last
[04/Dec/2019 02:09:43 +0000] 28842 MainThread agent WARNING Long HB processing time: 162.216438055
[04/Dec/2019 02:09:44 +0000] 28842 MainThread agent WARNING Delayed HB: 147s since last
[04/Dec/2019 02:11:34 +0000] 28842 MainThread agent WARNING Long HB processing time: 109.956519842
[04/Dec/2019 02:11:34 +0000] 28842 MainThread agent WARNING Delayed HB: 95s since last
[04/Dec/2019 02:12:10 +0000] 28842 MainThread agent WARNING Long HB processing time: 36.4911990166
[04/Dec/2019 02:12:11 +0000] 28842 MainThread agent WARNING Delayed HB: 21s since last
[04/Dec/2019 02:12:17 +0000] 28842 MainThread agent WARNING Long HB processing time: 6.5263209343
[04/Dec/2019 02:13:37 +0000] 28842 MainThread agent WARNING Long HB processing time: 71.4897689819
[04/Dec/2019 02:13:37 +0000] 28842 MainThread agent WARNING Delayed HB: 56s since last
[04/Dec/2019 02:15:11 +0000] 28842 MainThread heartbeat_tracker INFO HB stats (seconds): num:7 LIFE_MIN:0.00 min:0.20 mean:1.26 max:1.96 LIFE_MAX:4.53
[04/Dec/2019 02:15:13 +0000] 28842 MainThread agent WARNING Long HB processing time: 95.5481140614
[04/Dec/2019 02:15:13 +0000] 28842 MainThread agent WARNING Delayed HB: 80s since last
[04/Dec/2019 02:17:41 +0000] 28842 MainThread agent WARNING Long HB processing time: 147.541055918
[04/Dec/2019 02:17:41 +0000] 28842 MainThread agent WARNING Delayed HB: 132s since last
[04/Dec/2019 02:17:52 +0000] 28842 MainThread agent WARNING Long HB processing time: 11.1138081551
[04/Dec/2019 02:18:09 +0000] 28842 MainThread agent WARNING Long HB processing time: 13.3887140751
[04/Dec/2019 02:21:01 +0000] 28842 MainThread agent WARNING Long HB processing time: 170.292068958
[04/Dec/2019 02:21:01 +0000] 28842 MainThread agent WARNING Delayed HB: 155s since last
[04/Dec/2019 02:21:11 +0000] 28842 MainThread agent WARNING Long HB processing time: 9.28128290176
[04/Dec/2019 02:26:27 +0000] 28842 MainThread agent ERROR Heartbeating to 10.203.3.97:7182 failed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1396, in _send_heartbeat
response = self.requestor.request('heartbeat', heartbeat_data)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
return self.issue_request(call_request, message_name, request_datum)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
call_response = self.transceiver.transceive(call_request)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
result = self.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 487, in read_framed_message
response = self.conn.getresponse()
File "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
[04/Dec/2019 02:26:27 +0000] 28842 MainThread heartbeat_tracker INFO HB stats (seconds): num:6 LIFE_MIN:0.00 min:0.02 mean:0.62 max:1.55 LIFE_MAX:4.53
[04/Dec/2019 02:26:27 +0000] 28842 MainThread agent WARNING Long HB processing time: 310.67294383
[04/Dec/2019 02:26:28 +0000] 28842 MainThread agent WARNING Delayed HB: 295s since last
[04/Dec/2019 02:30:28 +0000] 28842 MainThread throttling_logger INFO (10 skipped) Identified java component java8 with full version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) for requested version 8.
[04/Dec/2019 02:30:29 +0000] 28842 MainThread agent WARNING Long HB processing time: 241.416838169
[04/Dec/2019 02:30:29 +0000] 28842 MainThread agent WARNING Delayed HB: 226s since last
[04/Dec/2019 02:37:11 +0000] 28842 MainThread agent ERROR Heartbeating to 10.203.3.97:7182 failed.
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1396, in _send_heartbeat
response = self.requestor.request('heartbeat', heartbeat_data)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
return self.issue_request(call_request, message_name, request_datum)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
call_response = self.transceiver.transceive(call_request)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
result = self.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 487, in read_framed_message
response = self.conn.getresponse()
File "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
[04/Dec/2019 02:37:11 +0000] 28842 MainThread heartbeat_tracker INFO HB stats (seconds): num:2 LIFE_MIN:0.00 min:0.44 mean:0.48 max:0.53 LIFE_MAX:4.53
[04/Dec/2019 02:37:11 +0000] 28842 MainThread agent WARNING Long HB processing time: 402.084249973
[04/Dec/2019 02:37:11 +0000] 28842 MainThread agent WARNING Delayed HB: 387s since last
could you get useful information from the above logs ?
I can see the logs show me heartbeat failed, but I don't know why ? could you give me some advises?
thank you very much .
Created 12-03-2019 11:02 AM
OS version is : CentOS 7.4 , it doesn't have swap space, is this a matter ? the below log I am not sure it has connection to swap, if Cloudera manager require swap, I can try to add swap space.
[04/Dec/2019 01:16:49 +0000] 28842 MonitorDaemon-Reporter throttling_logger WARNING Failed to get swap memory usage for process 13345: list index out of range
Created 12-03-2019 02:36 PM
Hi @iamfromsky ,
Thanks for the detailed information with stack trace. I did some research and think this may fit an old bug we had in the agent code which is triggering this memcpy. Ultimately the problem is due to one log file which happens to have some insanely long log lines.
Some questions for you:
Suggestion for you to see if you are hit by this issue:
Note: Before you clear the logs out, please make a backup in case you need to do a RCA to see which service is causing the long line.
If the issue still exists after doing above, then you should try to collect below information for further analysis:
1) collect top/pstack for the agent process
2) Run "SIGQUIT" on the agent PID,
e.g.# kill -SIGQUIT <pid>
3) Provide the agent logs.
4) Provide the output of the command "netstat -anp"
Thanks and hope this helps,
Li
Li Wang, Technical Solution Manager