Created 12-29-2018 02:14 AM
Hi,
I have a issue in CDH cluster。The cluster has been running a few days。But Today, in CM Host Tab,The Host Machine Health Test show :
This host is in contact with the Cloudera Manager Server. This host is not in contact with the Host Monitor.
The Health History show Error then becomes healthy。
CM Agent log:
[29/Dec/2018 16:42:11 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:12 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:13 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:18 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:20 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:20 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:24 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:24 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:25 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:25 +0000] 24147 MonitorDaemon-Reporter firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:39 +0000] 24147 MainThread agent WARNING Long HB processing time: 6.51629519463 [29/Dec/2018 16:42:53 +0000] 24147 MainThread agent WARNING Long HB processing time: 5.32933592796 [29/Dec/2018 16:45:06 +0000] 24147 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.07 min:0.10 mean:0.29 max:0.88 LIFE_MAX:0.72 [29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter throttling_logger ERROR (3 skipped) Error sending messages to firehose: mgmt-SERVICEMONITOR-02008505edb7b85b2295119db7eba412 Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehose.py", line 125, in _send self._requestor.request('sendAgentMessages', dict(messages=UNICODE_SANITIZER(messages))) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 141, in request return self.issue_request(call_request, message_name, request_datum) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 254, in issue_request call_response = self.transceiver.transceive(call_request) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 482, in transceive self.write_framed_message(request) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 501, in write_framed_message self.conn.request(req_method, self.req_resource, req_body, req_headers) File "/usr/lib64/python2.7/httplib.py", line 1017, in request self._send_request(method, url, body, headers) File "/usr/lib64/python2.7/httplib.py", line 1051, in _send_request self.endheaders(body) File "/usr/lib64/python2.7/httplib.py", line 1013, in endheaders self._send_output(message_body) File "/usr/lib64/python2.7/httplib.py", line 864, in _send_output self.send(msg) File "/usr/lib64/python2.7/httplib.py", line 840, in send self.sock.sendall(data) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 32] Broken pipe [29/Dec/2018 16:45:08 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9995<<<<<<<<<<<<<< [29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9995<<<<<<<<<<<<<< [29/Dec/2018 16:45:09 +0000] 24147 MainThread agent WARNING Long HB processing time: 6.45350694656 [29/Dec/2018 16:45:09 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:09 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:10 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:15 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:17 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:18 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:19 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:22 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
the address and port is my re-edit firehouse.py to print ,I telnet the hostname and port are good。
then in /var/log/cloudera-scm-firehose I can't found the error log。
I search like issue:https://community.cloudera.com/t5/Cloudera-Altus-Director/MonitorDaemon-Reporter-throttling-logger-E...
try to add Service Monitor and Host Monitor Memory,but not sloved.
please help me!
the CM version : 5.16.1
machine config As shown:
Created 01-01-2019 10:54 PM
I didn't find the re-editing option, and all the images showed an error. So I describe my problem again.
Hi,
I have an issue in CDH cluster. The cluster has been running for a few days.But Today, in CM Host Tab,The Host Machine Health Test shows :
Agent Status Suppress... This host is in contact with the Cloudera Manager Server. This host is not in contact with the Host Monitor.
The Health History show Error then becomes healthy.
2:30 PM 3 Became Good Show 2:29:03 PM Network Interface Speed Unknown 1 Still Bad Show 2:28:58 PM 1 Became Bad 1 Became Unknown Show 2:26 PM 3 Became Good Show 2:25:18 PM Network Interface Speed Unknown 1 Still Bad Show 2:25:13 PM 1 Became Bad 1 Became Unknown Show 2:23 PM 3 Became Good Show
CM Agent log:
[29/Dec/2018 16:42:11 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:12 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:13 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:18 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:20 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:20 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:24 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:24 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:25 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:25 +0000] 24147 MonitorDaemon-Reporter firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:42:39 +0000] 24147 MainThread agent WARNING Long HB processing time: 6.51629519463 [29/Dec/2018 16:42:53 +0000] 24147 MainThread agent WARNING Long HB processing time: 5.32933592796 [29/Dec/2018 16:45:06 +0000] 24147 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.07 min:0.10 mean:0.29 max:0.88 LIFE_MAX:0.72 [29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter throttling_logger ERROR (3 skipped) Error sending messages to firehose: mgmt-SERVICEMONITOR-02008505edb7b85b2295119db7eba412 Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehose.py", line 125, in _send self._requestor.request('sendAgentMessages', dict(messages=UNICODE_SANITIZER(messages))) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 141, in request return self.issue_request(call_request, message_name, request_datum) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 254, in issue_request call_response = self.transceiver.transceive(call_request) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 482, in transceive self.write_framed_message(request) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 501, in write_framed_message self.conn.request(req_method, self.req_resource, req_body, req_headers) File "/usr/lib64/python2.7/httplib.py", line 1017, in request self._send_request(method, url, body, headers) File "/usr/lib64/python2.7/httplib.py", line 1051, in _send_request self.endheaders(body) File "/usr/lib64/python2.7/httplib.py", line 1013, in endheaders self._send_output(message_body) File "/usr/lib64/python2.7/httplib.py", line 864, in _send_output self.send(msg) File "/usr/lib64/python2.7/httplib.py", line 840, in send self.sock.sendall(data) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 32] Broken pipe [29/Dec/2018 16:45:08 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9995<<<<<<<<<<<<<< [29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9995<<<<<<<<<<<<<< [29/Dec/2018 16:45:09 +0000] 24147 MainThread agent WARNING Long HB processing time: 6.45350694656 [29/Dec/2018 16:45:09 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:09 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:10 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:15 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:17 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:18 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:19 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<< [29/Dec/2018 16:45:22 +0000] 24147 ImpalaDaemonQueryMonitoring firehose INFO >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
I re-edited firehouse.py to print the address and port, and the telnet hostname and port are good.
then in /var/log/cloudera-scm-firehose I can't found the error log.
I search like issue:https://community.cloudera.com/t5/Cloudera-Altus-Director/MonitorDaemon-Reporter-throttling-logger-E...
try to add Service Monitor and Host Monitor Memory,but not sloved.
please help me!
the CM version:5.16.1
machine system version: centos 7.3.1611
Created 01-02-2019 11:02 AM
Thanks for discussing the issue you are facing.
This issue does not pose any threat to the functionality of your cluster, so the impact should be minimal.
Let's start with what we know:
- agent attempts to make an HTTP connection and gets the following:
[29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter throttling_logger ERROR (3 skipped) Error sending messages to firehose: mgmt-SERVICEMONITOR-02008505edb7b85b2295119db7eba412
. . . .
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe
This tells us a couple of important things:
(1)
The agent was able to make a TCP connection to the Host Monitor on port 9555 (the Host Monitor Listen Port). After that, the communication was severed as the connection appears to have been dropped somewhere between the agent and the Host Monitor Server
(2)
This appears to happen frequently.
The agent will periodically capture host information and then upload that to the Host Monitor for indexing and storage. The first place to look would be the Host Monitor log in /var/log/:
mgmt-cmf-mgmt-HOSTMONITOR*
You mentioned you could not find the error log but there is only the one log for all log information.
If you do not see anything wrong there, then we might need to employ more advanced diagnostics to determine what is happening.
Created 01-02-2019 07:14 PM
2019-01-03 07:44:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-02T23:40:00.000Z 2019-01-03 07:44:02,042 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT1.013S, numStreamsChecked=42228, numStreamsRolledUp=855 2019-01-03 07:48:01,011 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions 2019-01-03 07:49:01,028 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-02T23:49:01.028Z, forMigratedData=false 2019-01-03 07:54:01,028 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-02T23:54:01.028Z, forMigratedData=false 2019-01-03 07:54:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-02T23:50:00.000Z 2019-01-03 07:54:01,949 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.920S, numStreamsChecked=42228, numStreamsRolledUp=855 2019-01-03 07:58:01,013 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions 2019-01-03 07:59:01,028 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-02T23:59:01.028Z, forMigratedData=false 2019-01-03 07:59:29,591 INFO com.cloudera.cmf.BasicScmProxy: Failed request to SCM: 302 2019-01-03 07:59:30,591 INFO com.cloudera.cmf.BasicScmProxy: Authentication to SCM required. 2019-01-03 07:59:30,668 INFO com.cloudera.cmf.BasicScmProxy: Using encrypted credentials for SCM 2019-01-03 07:59:30,674 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM. 2019-01-03 08:04:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:04:01.029Z, forMigratedData=false 2019-01-03 08:04:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-03T00:00:00.000Z 2019-01-03 08:04:01,864 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.833S, numStreamsChecked=42228, numStreamsRolledUp=855 2019-01-03 08:04:01,864 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT600S to rollup=HOURLY for rollupTimestamp=2019-01-03T00:00:00.000Z 2019-01-03 08:04:02,689 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.825S, numStreamsChecked=42228, numStreamsRolledUp=859 2019-01-03 08:04:02,690 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT3600S to rollup=SIX_HOURLY for rollupTimestamp=2019-01-03T00:00:00.000Z 2019-01-03 08:04:03,448 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.758S, numStreamsChecked=42228, numStreamsRolledUp=859 2019-01-03 08:04:03,448 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT21600S to rollup=DAILY for rollupTimestamp=2019-01-03T00:00:00.000Z 2019-01-03 08:04:03,939 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.491S, numStreamsChecked=42228, numStreamsRolledUp=2061 2019-01-03 08:04:03,939 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT86400S to rollup=WEEKLY for rollupTimestamp=2019-01-03T00:00:00.000Z 2019-01-03 08:04:04,404 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.464S, numStreamsChecked=42228, numStreamsRolledUp=2061 2019-01-03 08:09:01,010 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions 2019-01-03 08:09:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:09:01.029Z, forMigratedData=false 2019-01-03 08:14:01,030 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:14:01.030Z, forMigratedData=false 2019-01-03 08:14:01,030 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-03T00:10:00.000Z 2019-01-03 08:14:01,961 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.929S, numStreamsChecked=42228, numStreamsRolledUp=859 2019-01-03 08:19:01,014 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions 2019-01-03 08:19:01,030 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:19:01.030Z, forMigratedData=false 2019-01-03 08:24:01,030 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:24:01.030Z, forMigratedData=false 2019-01-03 08:24:01,031 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-03T00:20:00.000Z 2019-01-03 08:24:02,274 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT1.243S, numStreamsChecked=42228, numStreamsRolledUp=859 2019-01-03 08:29:01,031 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:29:01.031Z, forMigratedData=false 2019-01-03 08:30:01,034 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitionsThe Service Monitor log:
2019-01-03 09:07:01,035 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1680dc02bc4071d 2019-01-03 09:07:16,122 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT167.412S, numStreamsChecked=6320720, numStreamsRolledUp=7678 2019-01-03 09:07:16,122 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT600S to rollup=HOURLY for rollupTimestamp=2019-01-03T01:00:00.000Z 2019-01-03 09:07:54,966 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions 2019-01-03 09:08:06,015 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x3ef8132e connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:08:06,020 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:08:06,025 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3680dc027f7070d 2019-01-03 09:08:43,137 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT87.015S, numStreamsChecked=6320720, numStreamsRolledUp=7681 2019-01-03 09:09:06,021 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xf496593 connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:09:06,025 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:09:06,029 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3680dc027f7070f 2019-01-03 09:09:28,710 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T01:09:28.710Z, forMigratedData=false 2019-01-03 09:10:06,019 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4cbadc22 connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:10:06,022 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:10:06,026 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1680dc02bc40724 2019-01-03 09:11:06,026 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x43eb1654 connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:11:06,031 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:11:06,035 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1680dc02bc40727 2019-01-03 09:11:26,101 INFO hive.metastore: Trying to connect to metastore with URI thrift://hadoop01.ddxq.idc:9083 2019-01-03 09:11:26,101 INFO hive.metastore: Opened a connection to metastore, current connections: 1 2019-01-03 09:11:26,103 INFO hive.metastore: Connected to metastore. 2019-01-03 09:11:26,812 INFO hive.metastore: Closed a connection to metastore, current connections: 0 2019-01-03 09:12:06,078 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x7b46b68a connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:12:06,082 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181 2019-01-03 09:12:06,087 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3680dc027f70713 2019-01-03 09:13:11,034 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6359e1f1 connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181No error messages.
Created 01-03-2019 10:21 AM
I agree that it is possible (and even likely) that the issue is more on the agent side. What we really need to see is what the agent is doing when at the time it is having trouble connecting. A great way to have a peek at the internals is to send a SIGQUIT signal to the agent which will trigger it to dump thread stacks to the agent. If you could run this a few times while CM is showing that the agent is out of contact with the host monitor, it might give us some clues if the agent is under stress at that time or not.
Created 01-03-2019 10:23 AM
Oops... clicked "POST" before telling you how to get the agent to dump the thread stacks to the agent log. You can run the following:
kill -SIGQUIT `cat /var/run/cloudera-scm-agent/cloudera-scm-agent.pid`
This will not cause the agent to restart or anything so it won't impact processing.
If you can run the kill -SIGQUIT a few times that would give us an idea of how the threads are progressing.
Created 01-03-2019 07:03 PM
@bgooleyI ran this command.
Then the log in /var/log/cloudera-scm-agent/cloudera-scm-agent.log as follows:
Dumping all Thread Stacks ... # Thread: Monitor-GenericMonitor(140497188263680) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496701748992) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140495628007168) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496676570880) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: CP Server Thread-9(140497741920000) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496651392768) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140495091136256) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: MonitorDaemon-Reporter(140497213441792) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 50, in run self._fn(*self._args, **self._kwargs) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/__init__.py", line 163, in _report self._report_for_monitors(monitors) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/__init__.py", line 214, in _report_for_monitors self.firehoses.send_smon_update(service_updates, role_updates, None) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehoses.py", line 149, in send_smon_update impala_query_updates=impala_query_updates) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehoses.py", line 181, in _send_agent_message firehose.send(dict(agent_msgs=[agentmsg])) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehose.py", line 107, in send self._send(messages) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehose.py", line 124, in _send self._requestor.request('sendAgentMessages', dict(messages=UNICODE_SANITIZER(messages))) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 136, in request self.write_call_request(message_name, request_datum, buffer_encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 178, in write_call_request self.write_request(message.request, request_datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 182, in write_request datum_writer.write(request_datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 770, in write self.write_data(self.writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data self.write_record(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record self.write_data(field.type, datum.get(field.name), encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data self.write_record(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record self.write_data(field.type, datum.get(field.name), encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 795, in write_data self.write_array(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 839, in write_array self.write_data(writers_schema.items, item, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data self.write_record(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record self.write_data(field.type, datum.get(field.name), encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 795, in write_data self.write_array(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 839, in write_array self.write_data(writers_schema.items, item, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data self.write_record(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record self.write_data(field.type, datum.get(field.name), encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 799, in write_data self.write_union(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 879, in write_union self.write_data(writers_schema.schemas[index_of_schema], datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 795, in write_data self.write_array(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 839, in write_array self.write_data(writers_schema.items, item, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data self.write_record(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record self.write_data(field.type, datum.get(field.name), encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 795, in write_data self.write_array(writers_schema, datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 839, in write_array self.write_data(writers_schema.items, item, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data raise schema.AvroException(fail_msg) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record self.write_data(field.type, datum.get(field.name), encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data raise schema.AvroException(fail_msg) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 840, in write_array encoder.write_long(0) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data raise schema.AvroException(fail_msg) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record self.write_data(field.type, datum.get(field.name), encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data raise schema.AvroException(fail_msg) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 879, in write_union self.write_data(writers_schema.schemas[index_of_schema], datum, encoder) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data raise schema.AvroException(fail_msg) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 303, in write_int self.write_long(datum); File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 313, in write_long self.write(chr(datum)) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 281, in write self.writer.write(datum) # Thread: HTTPServer Thread-2(140498278790912) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 764, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/servers.py", line 187, in _start_http_thread self.httpserver.start() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1838, in start self.tick() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1950, in tick return File: "/usr/lib64/python2.7/socket.py", line 202, in accept sock, addr = self._sock.accept() # Thread: Monitor-GenericMonitor(140495577650944) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140495586043648) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140495594436352) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: DnsResolutionMonitor(140497221834496) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/stoppable_thread.py", line 34, in run time.sleep(sleep) # Thread: Monitor-GenericMonitor(140495602829056) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496148092672) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496156485376) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-HostMonitor(140497230227200) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: MainThread(140498665850688) File: "/usr/lib64/cmf/agent/build/env/bin/cmf-agent", line 12, in <module> load_entry_point('cmf==5.16.1', 'console_scripts', 'cmf-agent')() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 3127, in main main_impl() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 3110, in main_impl agent.start() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 852, in start self.__issue_heartbeat() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 754, in __issue_heartbeat heartbeat_response = self.send_heartbeat(heartbeat) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 1401, in send_heartbeat response = self._send_heartbeat(heartbeat) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 1442, in _send_heartbeat response = self.requestor.request('heartbeat', heartbeat_data) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 141, in request return self.issue_request(call_request, message_name, request_datum) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 254, in issue_request call_response = self.transceiver.transceive(call_request) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 483, in transceive result = self.read_framed_message() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 487, in read_framed_message response = self.conn.getresponse() File: "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse response.begin() File: "/usr/lib64/python2.7/httplib.py", line 476, in begin self.msg = HTTPMessage(self.fp, 0) File: "/usr/lib64/python2.7/mimetools.py", line 25, in __init__ rfc822.Message.__init__(self, fp, seekable) File: "/usr/lib64/python2.7/rfc822.py", line 108, in __init__ self.readheaders() File: "/usr/lib64/python2.7/httplib.py", line 315, in readheaders line = self.fp.readline(_MAXLINE + 1) File: "/usr/lib64/python2.7/socket.py", line 476, in readline data = self._sock.recv(self._rbufsize) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/__init__.py", line 193, in dumpstacks for filename, lineno, name, line in traceback.extract_stack(stack): # Thread: Monitor-GenericMonitor(140496131307264) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: _TimeoutMonitor(140498287183616) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/plugins.py", line 471, in run time.sleep(self.interval) # Thread: Monitor-GenericMonitor(140495611221760) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496114521856) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140495619614464) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496693356288) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: CP Server Thread-7(140497758705408) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140495065958144) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: CredentialManager(140498388784896) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/kt_renewer.py", line 181, in run self._trigger.wait(_RENEWAL_PERIOD) File: "/usr/lib64/python2.7/threading.py", line 361, in wait _sleep(delay) # Thread: Monitor-GenericMonitor(140496164878080) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496659785472) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496139699968) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Metadata-Plugin(140498303969024) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 764, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/__init__.py", line 489, in wrapper return fn(self, *args, **kwargs) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/audit/navigator_thread.py", line 168, in _monitor_logs time.sleep(event_poll_interval) # Thread: CP Server Thread-11(140497725134592) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Profile-Plugin(140498295576320) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 764, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/__init__.py", line 489, in wrapper return fn(self, *args, **kwargs) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/audit/navigator_thread.py", line 168, in _monitor_logs time.sleep(event_poll_interval) # Thread: Monitor-GenericMonitor(140496684963584) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Audit-Plugin(140498312361728) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/python2.7/threading.py", line 764, in run self.__target(*self.__args, **self.__kwargs) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/__init__.py", line 489, in wrapper return fn(self, *args, **kwargs) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/audit/navigator_thread.py", line 168, in _monitor_logs time.sleep(event_poll_interval) # Thread: Thread-13(140497196656384) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg/threadpool.py", line 147, in run request = self._requests_queue.get(True, self._poll_timeout) File: "/usr/lib64/python2.7/Queue.py", line 177, in get self.not_empty.wait(remaining) File: "/usr/lib64/python2.7/threading.py", line 361, in wait _sleep(delay) # Thread: CP Server Thread-6(140497767098112) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: CP Server Thread-4(140498262005504) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140495057565440) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: CP Server Thread-8(140497750312704) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: Monitor-GenericMonitor(140496668178176) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: CP Server Thread-12(140497238619904) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: CP Server Thread-10(140497733527296) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) # Thread: ImpalaDaemonQueryMonitoring(140496122914560) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 871, in _check_for_queries completed_query_profiles)) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 909, in _get_completed_query_profiles return completed_query_ids, completed_query_profiles, True File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 601, in get_completed_queries return completed_queries File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 484, in get_completed_queries return next_start_datetime, next_last_file_timestamp, completed_query_profiles File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 313, in _get_completed_queries return next_start_datetime, latest_file_timestamp, completed_queries File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/event_streamer.py", line 114, in __init__ self.__filtered_file_list = self.__apply_file_filter() return filter_context["file_list"] File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsCommon-0.1-py2.7.egg/clusterstats/common/chain.py", line 25, in __call__ return True File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 103, in __call__ return True File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 108, in __set_start_offset f.set_start_offset(event.get_offset()) return event return event File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/event_reader.py", line 84, in get_prev_event return event return event, ''.join(prior_data) File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file_line_reader.py", line 86, in get_next_line return line return data, block_offset File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file_line_reader.py", line 203, in __read_data_till_next_newline return line # Thread: MonitorDaemon-Scheduler(140497205049088) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run self._cv.wait(wait_time) File: "/usr/lib64/python2.7/threading.py", line 361, in wait _sleep(delay) # Thread: CP Server Thread-5(140497775490816) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire() # Thread: CP Server Thread-3(140498270398208) File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap self.__bootstrap_inner() File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner self.run() File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run conn = self.server.requests.get() File: "/usr/lib64/python2.7/Queue.py", line 168, in get self.not_empty.wait() File: "/usr/lib64/python2.7/threading.py", line 339, in wait waiter.acquire()
Created 11-25-2019 04:28 AM
We are facing the exact similar issue.
Could you please help us understand how did you manage to resolve this issue?
thanks,
Pratik
Created 11-25-2019 09:03 AM
Hi @AstroPratik ,
First, in order for us to provide the best help, we need to make sure we have information about the issue you are observing. My guess is you are seeing the same health alert in Cloudera Manager, but we also need to confirm you are seeing the same messages in the agent log.
If so, please follow the instructions to provide a thread dump via the SIGQUIT signal. The instructions I provided for the "kill -SIGQUIT" command only work in Cloudera Manager 5.x. If you are using CM 6, you can use the following:
kill -SIGQUIT $(systemctl show -p MainPID cloudera-scm-agent.service 2>/dev/null | cut -d= -f2)
If you do run the kill SIGQUIT make sure to run it a couple times so we can compare snapshots AND make sure you get the thread dump when the problem is occurring.
NOTE: After reviewing the previous party's thread dump, it appears that a thread that is spawned to collect information for a diagnostic bundle is slow in processing; the thread that uploads service and host information to the Host and Service Monitor servers also seems to be slow.
Since the process of obtaining a diagnostic bundle is something that does not happen often, it is likely that the bundle creation is triggering the old event.
There are a number of possible causes for "firehose" trouble, though, so it is important that we understand the facts about your situation before making any judgements.