Support Questions

Find answers, ask questions, and share your expertise

python process takes 100% cpu time

avatar
Explorer

python process takes 100% cpu time and is just running on 1 cpu core (the namenode server has 24 cores). Because of the 100% utilization, cloudera manger gives Bad Health issues, like clock offset, agent status,..

What can be the reason for the 100% cpu utilization of python?

 

after a couple days the cloudera manager become stabel and the python process takes just 1% cpu time, but tody the problem occured again.

it seems there are 7 days between the start of the problem.

is it related to: Scheduled Diagnostic Data Collection Frequency setting in cloudera manager, wich is 7 days?

 

 

 

in the cloudera-scm-agent.log we see: 

 

[11/Jun/2014 16:10:00 +0000] 17439 CP Server Thread-7 _ INFO 10.20.20.160 - - [11/Jun/2014:16:10:00] "GET /heartbeat HTTP/1.1" 200 2 "" "NING/1.0"
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Activating Process 2445-collect-host-statistics
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2445-collect-host-statistics
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2445-collect-host-statistics to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2445-collect-host-statistics to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO prepare_environment begin: {u'IMPALA': u'1.2.4-1.p0.110', u'SOLR': u'1.1.0-1.cdh4.3.0.p0.21', 
u'CDH': u'4.5.0-1.cdh4.5.0.p0
.30'}, [], []
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO Service does not request any parcels
[11/Jun/2014 16:10:00 +0000] 17439 MainThread util INFO Extracted 0 files and 0 dirs to /var/run/cloudera-scm-agent/process/2445-collect-host-statistics.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs to root (0) root 
(0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Activating Process 2450-host-inspector
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2450-host-inspector
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2450-host-inspector to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2450-host-inspector to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO prepare_environment begin: {u'IMPALA': u'1.2.4-1.p0.110', u'SOLR': u'1.1.0-1.cdh4.3.0.p0.21', 
u'CDH': u'4.5.0-1.cdh4.5.0.p0
.30'}, [], []
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO Service does not request any parcels
[11/Jun/2014 16:10:00 +0000] 17439 MainThread util INFO Extracted 1 files and 0 dirs to /var/run/cloudera-scm-agent/process/2450-host-inspector.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2450-host-inspector/logs
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2450-host-inspector/logs to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2450-host-inspector/logs to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[11/Jun/2014 16:10:01 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:01] "GET /process/2450-host-inspector/files/inspector 
HTTP/1.1" 200 1455 "" "Ja
va/1.6.0_45"
[11/Jun/2014 16:10:07 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:07] "GET /process/2445-collect-host- 
statistics/files/host_statistics/nn01.achme
a.onmarc.local-nn01.achmea.onmarc.local-10.20.20.160-host-statistics.zip HTTP/1.1" 200 291217 "" "Java/1.6.0_45"
[11/Jun/2014 16:10:07 +0000] 17439 MainThread agent INFO Process with same id has changed: 2450-host-inspector.
[11/Jun/2014 16:10:07 +0000] 17439 MainThread agent INFO Deactivating process 2450-host-inspector
[11/Jun/2014 16:10:11 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:11] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-HOSTMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 - 
"" "Java/1.6.0_45"
[11/Jun/2014 16:10:11 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:11] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhadoop-0.20-mapreduce%2Fhadoop-cmf-mapreduce1-JOBTRACKER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:13 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:13] "GET /compressed_agent_logs? 
max_bytes=37008564&top_level_dir=nn01.achmea.on
marc.local-10.20.20.160 HTTP/1.1" 200 375603 "" "Java/1.6.0_45"
[11/Jun/2014 16:10:13 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:13] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-SERVICEMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:14 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:14] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhadoop-hdfs%2Fhadoop-cmf-hdfs1-SECONDARYNAMENODE-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 - 
"" "Java/1.6.0_45"
[11/Jun/2014 16:10:14 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:14] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-eventserver%2Fmgmt-cmf-mgmt1-EVENTSERVER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:15 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:15] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fzookeeper%2Fzookeeper-cmf-zookeeper1-SERVER-nn01.achmea.onmarc.local.log&log_type=LOG4J HTTP/1.1" 200 - "" 
"Java/1.6.0_45"
[11/Jun/2014 16:10:17 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:17] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-ACTIVITYMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:17 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:17] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-alertpublisher%2Fmgmt-cmf-mgmt1-ALERTPUBLISHER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J 
HTTP/1.1" 200 - "" "Java/1.6.0_4
5"
[11/Jun/2014 16:10:18 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:18] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fstatestore%2Fstatestored%5C..*%5C.INFO%5C..&log_type=GLOG HTTP/1.1" 200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:19 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:19] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhive%2Fhadoop-cmf-hive1-HIVESERVER2-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 - "" 
"Java/1.6.0_45"
[11/Jun/2014 16:10:19 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:19] "GET /retrieve_log_compressed? 
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcatalogd%2Fcatalogd%5C..*%5C.INFO%5C..&log_type=GLOG HTTP/1.1" 200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:29 +0000] 17439 MainThread agent INFO Process with same id has changed: 2445-collect-host-statistics.
[11/Jun/2014 16:10:29 +0000] 17439 MainThread agent INFO Deactivating process 2445-collect-host-statistics
[12/Jun/2014 16:11:26 +0000] 17439 MainThread agent INFO Deleting process 2445-collect-host-statistics
[12/Jun/2014 16:11:27 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[12/Jun/2014 16:12:21 +0000] 17439 MainThread agent INFO Retiring process 2445-collect-host-statistics
[12/Jun/2014 16:12:21 +0000] 17439 MainThread agent INFO Deleting process 2450-host-inspector
[12/Jun/2014 16:12:22 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[12/Jun/2014 16:13:22 +0000] 17439 MainThread agent INFO Retiring process 2450-host-inspector
[root@nn01 cloudera-scm-agent]#

1 ACCEPTED SOLUTION

avatar
Explorer

Hi Jan,

 
There was an issue with log collection (which is also part of diagnostic data collection) that can result in the agent consuming lots of CPU in certain cases. This was fixed in in CM 4.8.3.
 
Clearing hadoop log directories (especially hbase) might help alleviate the problem until you get a chance to upgrade.

View solution in original post

4 REPLIES 4

avatar
Explorer

after a couple days the cloudera manager become stabel and the python process takes just 1% cpu time, but tody the problem occured again.

it seems there are 7 days between the start of the problem.

is it related to: Scheduled Diagnostic Data Collection Frequency setting in cloudera manager, wich is 7 days?

 

 

 

in the cloudera-scm-agent.log we see: 

 

[11/Jun/2014 16:10:00 +0000] 17439 CP Server Thread-7 _ INFO 10.20.20.160 - - [11/Jun/2014:16:10:00] "GET /heartbeat HTTP/1.1" 200 2 "" "NING/1.0"
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Activating Process 2445-collect-host-statistics
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2445-collect-host-statistics
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2445-collect-host-statistics to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2445-collect-host-statistics to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO prepare_environment begin: {u'IMPALA': u'1.2.4-1.p0.110', u'SOLR': u'1.1.0-1.cdh4.3.0.p0.21',
u'CDH': u'4.5.0-1.cdh4.5.0.p0
.30'}, [], []
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO Service does not request any parcels
[11/Jun/2014 16:10:00 +0000] 17439 MainThread util INFO Extracted 0 files and 0 dirs to /var/run/cloudera-scm-agent/process/2445-collect-host-statistics.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs to root (0) root
(0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Activating Process 2450-host-inspector
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2450-host-inspector
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2450-host-inspector to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2450-host-inspector to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO prepare_environment begin: {u'IMPALA': u'1.2.4-1.p0.110', u'SOLR': u'1.1.0-1.cdh4.3.0.p0.21',
u'CDH': u'4.5.0-1.cdh4.5.0.p0
.30'}, [], []
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO Service does not request any parcels
[11/Jun/2014 16:10:00 +0000] 17439 MainThread util INFO Extracted 1 files and 0 dirs to /var/run/cloudera-scm-agent/process/2450-host-inspector.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2450-host-inspector/logs
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2450-host-inspector/logs to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2450-host-inspector/logs to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[11/Jun/2014 16:10:01 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:01] "GET /process/2450-host-inspector/files/inspector
HTTP/1.1" 200 1455 "" "Ja
va/1.6.0_45"
[11/Jun/2014 16:10:07 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:07] "GET /process/2445-collect-host-
statistics/files/host_statistics/nn01.achme
a.onmarc.local-nn01.achmea.onmarc.local-10.20.20.160-host-statistics.zip HTTP/1.1" 200 291217 "" "Java/1.6.0_45"
[11/Jun/2014 16:10:07 +0000] 17439 MainThread agent INFO Process with same id has changed: 2450-host-inspector.
[11/Jun/2014 16:10:07 +0000] 17439 MainThread agent INFO Deactivating process 2450-host-inspector
[11/Jun/2014 16:10:11 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:11] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-HOSTMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 -
"" "Java/1.6.0_45"
[11/Jun/2014 16:10:11 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:11] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhadoop-0.20-mapreduce%2Fhadoop-cmf-mapreduce1-JOBTRACKER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:13 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:13] "GET /compressed_agent_logs?
max_bytes=37008564&top_level_dir=nn01.achmea.on
marc.local-10.20.20.160 HTTP/1.1" 200 375603 "" "Java/1.6.0_45"
[11/Jun/2014 16:10:13 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:13] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-SERVICEMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:14 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:14] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhadoop-hdfs%2Fhadoop-cmf-hdfs1-SECONDARYNAMENODE-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 -
"" "Java/1.6.0_45"
[11/Jun/2014 16:10:14 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:14] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-eventserver%2Fmgmt-cmf-mgmt1-EVENTSERVER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:15 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:15] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fzookeeper%2Fzookeeper-cmf-zookeeper1-SERVER-nn01.achmea.onmarc.local.log&log_type=LOG4J HTTP/1.1" 200 - ""
"Java/1.6.0_45"
[11/Jun/2014 16:10:17 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:17] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-ACTIVITYMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:17 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:17] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-alertpublisher%2Fmgmt-cmf-mgmt1-ALERTPUBLISHER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J
HTTP/1.1" 200 - "" "Java/1.6.0_4
5"
[11/Jun/2014 16:10:18 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:18] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fstatestore%2Fstatestored%5C..*%5C.INFO%5C..&log_type=GLOG HTTP/1.1" 200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:19 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:19] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhive%2Fhadoop-cmf-hive1-HIVESERVER2-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 - ""
"Java/1.6.0_45"
[11/Jun/2014 16:10:19 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:19] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcatalogd%2Fcatalogd%5C..*%5C.INFO%5C..&log_type=GLOG HTTP/1.1" 200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:29 +0000] 17439 MainThread agent INFO Process with same id has changed: 2445-collect-host-statistics.
[11/Jun/2014 16:10:29 +0000] 17439 MainThread agent INFO Deactivating process 2445-collect-host-statistics
[12/Jun/2014 16:11:26 +0000] 17439 MainThread agent INFO Deleting process 2445-collect-host-statistics
[12/Jun/2014 16:11:27 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[12/Jun/2014 16:12:21 +0000] 17439 MainThread agent INFO Retiring process 2445-collect-host-statistics
[12/Jun/2014 16:12:21 +0000] 17439 MainThread agent INFO Deleting process 2450-host-inspector
[12/Jun/2014 16:12:22 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[12/Jun/2014 16:13:22 +0000] 17439 MainThread agent INFO Retiring process 2450-host-inspector
[root@nn01 cloudera-scm-agent]#

avatar
Explorer

Hi Jan,

 
There was an issue with log collection (which is also part of diagnostic data collection) that can result in the agent consuming lots of CPU in certain cases. This was fixed in in CM 4.8.3.
 
Clearing hadoop log directories (especially hbase) might help alleviate the problem until you get a chance to upgrade.

avatar
Explorer

I'm running Cloudera Manager 5.3.2 and have the exact same issue every Wednesday at sometime between 4:00pm to 4:40pm. Because the cloudera-scm-agent is too busy, cloudera manager sends a lot of warnings and errors. 

 

It suddenly happened in July, 2016. Before that, it had been running ok for a year. I have to restart the service clouder-scm-agent when this happens.

 

Is there something I can do? We may not upgrade to the latest version soon.

avatar
Explorer

The culprit is "Send Diagnostic Data to Cloudera Automatically" in Administration -> Settings -> Support. Because my cluster is running CDH 5.3.2, I guessed Cloudera probably shut down a service, so my cluster could not connect to the service, then the agent became crazy. After turning that off, such an issue never happen again.