python process takes 100% cpu time and is just running on 1 cpu core (the namenode server has 24 cores). Because of the 100% utilization, cloudera manger gives Bad Health issues, like clock offset, agent status,..
What can be the reason for the 100% cpu utilization of python?
after a couple days the cloudera manager become stabel and the python process takes just 1% cpu time, but tody the problem occured again.
it seems there are 7 days between the start of the problem.
is it related to: Scheduled Diagnostic Data Collection Frequency setting in cloudera manager, wich is 7 days?
in the cloudera-scm-agent.log we see:
[11/Jun/2014 16:10:00 +0000] 17439 CP Server Thread-7 _ INFO 10.20.20.160 - - [11/Jun/2014:16:10:00] "GET /heartbeat HTTP/1.1" 200 2 "" "NING/1.0"
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Activating Process 2445-collect-host-statistics
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2445-collect-host-statistics
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2445-collect-host-statistics to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2445-collect-host-statistics to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO prepare_environment begin: {u'IMPALA': u'1.2.4-1.p0.110', u'SOLR': u'1.1.0-1.cdh4.3.0.p0.21',
u'CDH': u'4.5.0-1.cdh4.5.0.p0
.30'}, [], []
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO Service does not request any parcels
[11/Jun/2014 16:10:00 +0000] 17439 MainThread util INFO Extracted 0 files and 0 dirs to /var/run/cloudera-scm-agent/process/2445-collect-host-statistics.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs to root (0) root
(0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Activating Process 2450-host-inspector
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2450-host-inspector
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2450-host-inspector to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2450-host-inspector to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO prepare_environment begin: {u'IMPALA': u'1.2.4-1.p0.110', u'SOLR': u'1.1.0-1.cdh4.3.0.p0.21',
u'CDH': u'4.5.0-1.cdh4.5.0.p0
.30'}, [], []
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO Service does not request any parcels
[11/Jun/2014 16:10:00 +0000] 17439 MainThread util INFO Extracted 1 files and 0 dirs to /var/run/cloudera-scm-agent/process/2450-host-inspector.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2450-host-inspector/logs
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2450-host-inspector/logs to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2450-host-inspector/logs to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[11/Jun/2014 16:10:01 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:01] "GET /process/2450-host-inspector/files/inspector
HTTP/1.1" 200 1455 "" "Ja
va/1.6.0_45"
[11/Jun/2014 16:10:07 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:07] "GET /process/2445-collect-host-
statistics/files/host_statistics/nn01.achme
a.onmarc.local-nn01.achmea.onmarc.local-10.20.20.160-host-statistics.zip HTTP/1.1" 200 291217 "" "Java/1.6.0_45"
[11/Jun/2014 16:10:07 +0000] 17439 MainThread agent INFO Process with same id has changed: 2450-host-inspector.
[11/Jun/2014 16:10:07 +0000] 17439 MainThread agent INFO Deactivating process 2450-host-inspector
[11/Jun/2014 16:10:11 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:11] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-HOSTMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 -
"" "Java/1.6.0_45"
[11/Jun/2014 16:10:11 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:11] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhadoop-0.20-mapreduce%2Fhadoop-cmf-mapreduce1-JOBTRACKER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:13 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:13] "GET /compressed_agent_logs?
max_bytes=37008564&top_level_dir=nn01.achmea.on
marc.local-10.20.20.160 HTTP/1.1" 200 375603 "" "Java/1.6.0_45"
[11/Jun/2014 16:10:13 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:13] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-SERVICEMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:14 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:14] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhadoop-hdfs%2Fhadoop-cmf-hdfs1-SECONDARYNAMENODE-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 -
"" "Java/1.6.0_45"
[11/Jun/2014 16:10:14 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:14] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-eventserver%2Fmgmt-cmf-mgmt1-EVENTSERVER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:15 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:15] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fzookeeper%2Fzookeeper-cmf-zookeeper1-SERVER-nn01.achmea.onmarc.local.log&log_type=LOG4J HTTP/1.1" 200 - ""
"Java/1.6.0_45"
[11/Jun/2014 16:10:17 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:17] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-ACTIVITYMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:17 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:17] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-alertpublisher%2Fmgmt-cmf-mgmt1-ALERTPUBLISHER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J
HTTP/1.1" 200 - "" "Java/1.6.0_4
5"
[11/Jun/2014 16:10:18 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:18] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fstatestore%2Fstatestored%5C..*%5C.INFO%5C..&log_type=GLOG HTTP/1.1" 200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:19 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:19] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhive%2Fhadoop-cmf-hive1-HIVESERVER2-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 - ""
"Java/1.6.0_45"
[11/Jun/2014 16:10:19 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:19] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcatalogd%2Fcatalogd%5C..*%5C.INFO%5C..&log_type=GLOG HTTP/1.1" 200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:29 +0000] 17439 MainThread agent INFO Process with same id has changed: 2445-collect-host-statistics.
[11/Jun/2014 16:10:29 +0000] 17439 MainThread agent INFO Deactivating process 2445-collect-host-statistics
[12/Jun/2014 16:11:26 +0000] 17439 MainThread agent INFO Deleting process 2445-collect-host-statistics
[12/Jun/2014 16:11:27 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[12/Jun/2014 16:12:21 +0000] 17439 MainThread agent INFO Retiring process 2445-collect-host-statistics
[12/Jun/2014 16:12:21 +0000] 17439 MainThread agent INFO Deleting process 2450-host-inspector
[12/Jun/2014 16:12:22 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[12/Jun/2014 16:13:22 +0000] 17439 MainThread agent INFO Retiring process 2450-host-inspector
[root@nn01 cloudera-scm-agent]#
Created 06-18-2014 01:33 PM
Hi Jan,
Created 06-18-2014 12:37 PM
after a couple days the cloudera manager become stabel and the python process takes just 1% cpu time, but tody the problem occured again.
it seems there are 7 days between the start of the problem.
is it related to: Scheduled Diagnostic Data Collection Frequency setting in cloudera manager, wich is 7 days?
in the cloudera-scm-agent.log we see:
[11/Jun/2014 16:10:00 +0000] 17439 CP Server Thread-7 _ INFO 10.20.20.160 - - [11/Jun/2014:16:10:00] "GET /heartbeat HTTP/1.1" 200 2 "" "NING/1.0"
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Activating Process 2445-collect-host-statistics
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2445-collect-host-statistics
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2445-collect-host-statistics to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2445-collect-host-statistics to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO prepare_environment begin: {u'IMPALA': u'1.2.4-1.p0.110', u'SOLR': u'1.1.0-1.cdh4.3.0.p0.21',
u'CDH': u'4.5.0-1.cdh4.5.0.p0
.30'}, [], []
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO Service does not request any parcels
[11/Jun/2014 16:10:00 +0000] 17439 MainThread util INFO Extracted 0 files and 0 dirs to /var/run/cloudera-scm-agent/process/2445-collect-host-statistics.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs to root (0) root
(0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2445-collect-host-statistics/logs to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Activating Process 2450-host-inspector
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2450-host-inspector
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2450-host-inspector to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2450-host-inspector to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO prepare_environment begin: {u'IMPALA': u'1.2.4-1.p0.110', u'SOLR': u'1.1.0-1.cdh4.3.0.p0.21',
u'CDH': u'4.5.0-1.cdh4.5.0.p0
.30'}, [], []
[11/Jun/2014 16:10:00 +0000] 17439 MainThread parcel INFO Service does not request any parcels
[11/Jun/2014 16:10:00 +0000] 17439 MainThread util INFO Extracted 1 files and 0 dirs to /var/run/cloudera-scm-agent/process/2450-host-inspector.
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/2450-host-inspector/logs
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/2450-host-inspector/logs to root (0) root (0)
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/2450-host-inspector/logs to 0751
[11/Jun/2014 16:10:00 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[11/Jun/2014 16:10:01 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:01] "GET /process/2450-host-inspector/files/inspector
HTTP/1.1" 200 1455 "" "Ja
va/1.6.0_45"
[11/Jun/2014 16:10:07 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:07] "GET /process/2445-collect-host-
statistics/files/host_statistics/nn01.achme
a.onmarc.local-nn01.achmea.onmarc.local-10.20.20.160-host-statistics.zip HTTP/1.1" 200 291217 "" "Java/1.6.0_45"
[11/Jun/2014 16:10:07 +0000] 17439 MainThread agent INFO Process with same id has changed: 2450-host-inspector.
[11/Jun/2014 16:10:07 +0000] 17439 MainThread agent INFO Deactivating process 2450-host-inspector
[11/Jun/2014 16:10:11 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:11] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-HOSTMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 -
"" "Java/1.6.0_45"
[11/Jun/2014 16:10:11 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:11] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhadoop-0.20-mapreduce%2Fhadoop-cmf-mapreduce1-JOBTRACKER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:13 +0000] 17439 CP Server Thread-8 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:13] "GET /compressed_agent_logs?
max_bytes=37008564&top_level_dir=nn01.achmea.on
marc.local-10.20.20.160 HTTP/1.1" 200 375603 "" "Java/1.6.0_45"
[11/Jun/2014 16:10:13 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:13] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-SERVICEMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:14 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:14] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhadoop-hdfs%2Fhadoop-cmf-hdfs1-SECONDARYNAMENODE-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 -
"" "Java/1.6.0_45"
[11/Jun/2014 16:10:14 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:14] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-eventserver%2Fmgmt-cmf-mgmt1-EVENTSERVER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:15 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:15] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fzookeeper%2Fzookeeper-cmf-zookeeper1-SERVER-nn01.achmea.onmarc.local.log&log_type=LOG4J HTTP/1.1" 200 - ""
"Java/1.6.0_45"
[11/Jun/2014 16:10:17 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:17] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-firehose%2Fmgmt-cmf-mgmt1-ACTIVITYMONITOR-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1"
200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:17 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:17] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcloudera-scm-alertpublisher%2Fmgmt-cmf-mgmt1-ALERTPUBLISHER-nn01.achmea.onmarc.local.log.out&log_type=LOG4J
HTTP/1.1" 200 - "" "Java/1.6.0_4
5"
[11/Jun/2014 16:10:18 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:18] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fstatestore%2Fstatestored%5C..*%5C.INFO%5C..&log_type=GLOG HTTP/1.1" 200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:19 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:19] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fhive%2Fhadoop-cmf-hive1-HIVESERVER2-nn01.achmea.onmarc.local.log.out&log_type=LOG4J HTTP/1.1" 200 - ""
"Java/1.6.0_45"
[11/Jun/2014 16:10:19 +0000] 17439 CP Server Thread-10 _cplogging INFO 10.20.20.160 - - [11/Jun/2014:16:10:19] "GET /retrieve_log_compressed?
byte_limit=37008564&end_time=1402495800020&s
earch_timeout_millis=60000&log_path=%2Fvar%2Flog%2Fcatalogd%2Fcatalogd%5C..*%5C.INFO%5C..&log_type=GLOG HTTP/1.1" 200 - "" "Java/1.6.0_45"
[11/Jun/2014 16:10:29 +0000] 17439 MainThread agent INFO Process with same id has changed: 2445-collect-host-statistics.
[11/Jun/2014 16:10:29 +0000] 17439 MainThread agent INFO Deactivating process 2445-collect-host-statistics
[12/Jun/2014 16:11:26 +0000] 17439 MainThread agent INFO Deleting process 2445-collect-host-statistics
[12/Jun/2014 16:11:27 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[12/Jun/2014 16:12:21 +0000] 17439 MainThread agent INFO Retiring process 2445-collect-host-statistics
[12/Jun/2014 16:12:21 +0000] 17439 MainThread agent INFO Deleting process 2450-host-inspector
[12/Jun/2014 16:12:22 +0000] 17439 MainThread agent INFO Triggering supervisord update.
[12/Jun/2014 16:13:22 +0000] 17439 MainThread agent INFO Retiring process 2450-host-inspector
[root@nn01 cloudera-scm-agent]#
Created 06-18-2014 01:33 PM
Hi Jan,
Created 11-16-2016 05:08 PM
I'm running Cloudera Manager 5.3.2 and have the exact same issue every Wednesday at sometime between 4:00pm to 4:40pm. Because the cloudera-scm-agent is too busy, cloudera manager sends a lot of warnings and errors.
It suddenly happened in July, 2016. Before that, it had been running ok for a year. I have to restart the service clouder-scm-agent when this happens.
Is there something I can do? We may not upgrade to the latest version soon.
Created 10-20-2017 01:20 PM
The culprit is "Send Diagnostic Data to Cloudera Automatically" in Administration -> Settings -> Support. Because my cluster is running CDH 5.3.2, I guessed Cloudera probably shut down a service, so my cluster could not connect to the service, then the agent became crazy. After turning that off, such an issue never happen again.