Support Questions

Find answers, ask questions, and share your expertise

CM5.16.1 Agent Cannot send message to firehose

avatar
Explorer

Hi,

   I have a issue in CDH cluster。The cluster has been running a few days。But Today, in CM Host Tab,The Host Machine Health Test show :

This host is in contact with the Cloudera Manager Server. This host is not in contact with the Host Monitor.

The Health History show Error then becomes healthy。

CM Agent log:

 

[29/Dec/2018 16:42:11 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:12 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:13 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:18 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:20 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:20 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:24 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:24 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:25 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:25 +0000] 24147 MonitorDaemon-Reporter firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:39 +0000] 24147 MainThread agent        WARNING  Long HB processing time: 6.51629519463
[29/Dec/2018 16:42:53 +0000] 24147 MainThread agent        WARNING  Long HB processing time: 5.32933592796
[29/Dec/2018 16:45:06 +0000] 24147 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.07 min:0.10 mean:0.29 max:0.88 LIFE_MAX:0.72
[29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter throttling_logger ERROR    (3 skipped) Error sending messages to firehose: mgmt-SERVICEMONITOR-02008505edb7b85b2295119db7eba412
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehose.py", line 125, in _send
    self._requestor.request('sendAgentMessages', dict(messages=UNICODE_SANITIZER(messages)))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 141, in request
    return self.issue_request(call_request, message_name, request_datum)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 254, in issue_request
    call_response = self.transceiver.transceive(call_request)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 482, in transceive
    self.write_framed_message(request)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 501, in write_framed_message
    self.conn.request(req_method, self.req_resource, req_body, req_headers)
  File "/usr/lib64/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.7/httplib.py", line 840, in send
    self.sock.sendall(data)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe
[29/Dec/2018 16:45:08 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9995<<<<<<<<<<<<<<
[29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9995<<<<<<<<<<<<<<
[29/Dec/2018 16:45:09 +0000] 24147 MainThread agent        WARNING  Long HB processing time: 6.45350694656
[29/Dec/2018 16:45:09 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:09 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:10 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:15 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:17 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:18 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:19 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:22 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<

the address and port is my re-edit firehouse.py to print ,I telnet the hostname and port are good。

then in /var/log/cloudera-scm-firehose I can't found the error log。

I search like issue:https://community.cloudera.com/t5/Cloudera-Altus-Director/MonitorDaemon-Reporter-throttling-logger-E...

try to add Service Monitor and Host Monitor Memory,but not sloved.

please help me!

 

the CM version : 5.16.1

 

machine config As shown:

 

8 REPLIES 8

avatar
Explorer

I didn't find the re-editing option, and all the images showed an error. So I describe my problem again.

 

Hi,

   I have an issue in CDH cluster. The cluster has been running for a few days.But Today, in CM Host Tab,The Host Machine Health Test shows :

 

Agent Status Suppress...
  This host is in contact with the Cloudera Manager Server. This host is not in contact with the Host Monitor.

 

The Health History show Error then becomes healthy.

  2:30 PM	3 Became Good  Show
  2:29:03 PM	Network Interface Speed Unknown  1 Still Bad  Show
  2:28:58 PM	1 Became Bad   1 Became Unknown         Show
  2:26 PM	 3 Became Good   Show
  2:25:18 PM	 Network Interface Speed Unknown   1 Still Bad   Show
  2:25:13 PM	 1 Became Bad    1 Became Unknown   Show
  2:23 PM	  3 Became Good     Show

CM Agent log:

[29/Dec/2018 16:42:11 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:12 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:13 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:18 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:20 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:20 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:24 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:24 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:25 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:25 +0000] 24147 MonitorDaemon-Reporter firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:42:39 +0000] 24147 MainThread agent        WARNING  Long HB processing time: 6.51629519463
[29/Dec/2018 16:42:53 +0000] 24147 MainThread agent        WARNING  Long HB processing time: 5.32933592796
[29/Dec/2018 16:45:06 +0000] 24147 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.07 min:0.10 mean:0.29 max:0.88 LIFE_MAX:0.72
[29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter throttling_logger ERROR    (3 skipped) Error sending messages to firehose: mgmt-SERVICEMONITOR-02008505edb7b85b2295119db7eba412
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehose.py", line 125, in _send
    self._requestor.request('sendAgentMessages', dict(messages=UNICODE_SANITIZER(messages)))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 141, in request
    return self.issue_request(call_request, message_name, request_datum)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 254, in issue_request
    call_response = self.transceiver.transceive(call_request)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 482, in transceive
    self.write_framed_message(request)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 501, in write_framed_message
    self.conn.request(req_method, self.req_resource, req_body, req_headers)
  File "/usr/lib64/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib64/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib64/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib64/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.7/httplib.py", line 840, in send
    self.sock.sendall(data)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe
[29/Dec/2018 16:45:08 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9995<<<<<<<<<<<<<<
[29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9995<<<<<<<<<<<<<<
[29/Dec/2018 16:45:09 +0000] 24147 MainThread agent        WARNING  Long HB processing time: 6.45350694656
[29/Dec/2018 16:45:09 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:09 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:10 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:15 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:17 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:18 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:19 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:21 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<
[29/Dec/2018 16:45:22 +0000] 24147 ImpalaDaemonQueryMonitoring firehose     INFO     >>>>>>>>>>>>>>>>address : hadoop05.ddxq.idc port: 9997<<<<<<<<<<<<<<

I re-edited firehouse.py to print the address and port, and the telnet hostname and port are good.

then in /var/log/cloudera-scm-firehose I can't found the error log.

I search like issue:https://community.cloudera.com/t5/Cloudera-Altus-Director/MonitorDaemon-Reporter-throttling-logger-E...

try to add Service Monitor and Host Monitor Memory,but not sloved.

please help me!

 

the CM version:5.16.1

machine system version: centos 7.3.1611

avatar
Master Guru

@zy001,

 

Thanks for discussing the issue you are facing.

This issue does not pose any threat to the functionality of your cluster, so the impact should be minimal.

 

Let's start with what we know:

 

- agent attempts to make an HTTP connection and gets the following:

 

[29/Dec/2018 16:45:08 +0000] 24147 MonitorDaemon-Reporter throttling_logger ERROR (3 skipped) Error sending messages to firehose: mgmt-SERVICEMONITOR-02008505edb7b85b2295119db7eba412

. . . .

File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe

 

This tells us a couple of important things:

 

(1)

 

The agent was able to make a TCP connection to the Host Monitor on port 9555 (the Host Monitor Listen Port).  After that, the communication was severed as the connection appears to have been dropped somewhere between the agent and the Host Monitor Server

 

(2)

 

This appears to happen frequently.

 

 

The agent will periodically capture host information and then upload that to the Host Monitor for indexing and storage.  The first place to look would be the Host Monitor log in /var/log/:

 

mgmt-cmf-mgmt-HOSTMONITOR*

 

You mentioned you could not find the error log but there is only the one log for all log information.

 

If you do not see anything wrong there, then we might need to employ more advanced diagnostics to determine what is happening.

avatar
Explorer
@bgooleyThanks for the reply.
yes,I could not find the error log in Host Monitor and Server Monitor.
The Host Monitor log is as follows:
2019-01-03 07:44:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-02T23:40:00.000Z
2019-01-03 07:44:02,042 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT1.013S, numStreamsChecked=42228, numStreamsRolledUp=855
2019-01-03 07:48:01,011 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions
2019-01-03 07:49:01,028 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-02T23:49:01.028Z, forMigratedData=false
2019-01-03 07:54:01,028 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-02T23:54:01.028Z, forMigratedData=false
2019-01-03 07:54:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-02T23:50:00.000Z
2019-01-03 07:54:01,949 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.920S, numStreamsChecked=42228, numStreamsRolledUp=855
2019-01-03 07:58:01,013 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions
2019-01-03 07:59:01,028 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-02T23:59:01.028Z, forMigratedData=false
2019-01-03 07:59:29,591 INFO com.cloudera.cmf.BasicScmProxy: Failed request to SCM: 302
2019-01-03 07:59:30,591 INFO com.cloudera.cmf.BasicScmProxy: Authentication to SCM required.
2019-01-03 07:59:30,668 INFO com.cloudera.cmf.BasicScmProxy: Using encrypted credentials for SCM
2019-01-03 07:59:30,674 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM.
2019-01-03 08:04:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:04:01.029Z, forMigratedData=false
2019-01-03 08:04:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-03T00:00:00.000Z
2019-01-03 08:04:01,864 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.833S, numStreamsChecked=42228, numStreamsRolledUp=855
2019-01-03 08:04:01,864 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT600S to rollup=HOURLY for rollupTimestamp=2019-01-03T00:00:00.000Z
2019-01-03 08:04:02,689 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.825S, numStreamsChecked=42228, numStreamsRolledUp=859
2019-01-03 08:04:02,690 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT3600S to rollup=SIX_HOURLY for rollupTimestamp=2019-01-03T00:00:00.000Z
2019-01-03 08:04:03,448 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.758S, numStreamsChecked=42228, numStreamsRolledUp=859
2019-01-03 08:04:03,448 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT21600S to rollup=DAILY for rollupTimestamp=2019-01-03T00:00:00.000Z
2019-01-03 08:04:03,939 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.491S, numStreamsChecked=42228, numStreamsRolledUp=2061
2019-01-03 08:04:03,939 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT86400S to rollup=WEEKLY for rollupTimestamp=2019-01-03T00:00:00.000Z
2019-01-03 08:04:04,404 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.464S, numStreamsChecked=42228, numStreamsRolledUp=2061
2019-01-03 08:09:01,010 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions
2019-01-03 08:09:01,029 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:09:01.029Z, forMigratedData=false
2019-01-03 08:14:01,030 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:14:01.030Z, forMigratedData=false
2019-01-03 08:14:01,030 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-03T00:10:00.000Z
2019-01-03 08:14:01,961 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.929S, numStreamsChecked=42228, numStreamsRolledUp=859
2019-01-03 08:19:01,014 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions
2019-01-03 08:19:01,030 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:19:01.030Z, forMigratedData=false
2019-01-03 08:24:01,030 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:24:01.030Z, forMigratedData=false
2019-01-03 08:24:01,031 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2019-01-03T00:20:00.000Z
2019-01-03 08:24:02,274 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT1.243S, numStreamsChecked=42228, numStreamsRolledUp=859
2019-01-03 08:29:01,031 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T00:29:01.031Z, forMigratedData=false
2019-01-03 08:30:01,034 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions
The Service Monitor log:
2019-01-03 09:07:01,035 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1680dc02bc4071d
2019-01-03 09:07:16,122 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT167.412S, numStreamsChecked=6320720, numStreamsRolledUp=7678
2019-01-03 09:07:16,122 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from ts_stream_rollup_PT600S to rollup=HOURLY for rollupTimestamp=2019-01-03T01:00:00.000Z
2019-01-03 09:07:54,966 INFO com.cloudera.cmon.tstore.leveldb.LDBResourceManager: Closed: 0 partitions
2019-01-03 09:08:06,015 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x3ef8132e connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:08:06,020 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:08:06,025 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3680dc027f7070d
2019-01-03 09:08:43,137 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT87.015S, numStreamsChecked=6320720, numStreamsRolledUp=7681
2019-01-03 09:09:06,021 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xf496593 connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:09:06,025 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:09:06,029 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3680dc027f7070f
2019-01-03 09:09:28,710 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2019-01-03T01:09:28.710Z, forMigratedData=false
2019-01-03 09:10:06,019 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4cbadc22 connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:10:06,022 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:10:06,026 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1680dc02bc40724
2019-01-03 09:11:06,026 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x43eb1654 connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:11:06,031 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:11:06,035 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1680dc02bc40727
2019-01-03 09:11:26,101 INFO hive.metastore: Trying to connect to metastore with URI thrift://hadoop01.ddxq.idc:9083
2019-01-03 09:11:26,101 INFO hive.metastore: Opened a connection to metastore, current connections: 1
2019-01-03 09:11:26,103 INFO hive.metastore: Connected to metastore.
2019-01-03 09:11:26,812 INFO hive.metastore: Closed a connection to metastore, current connections: 0
2019-01-03 09:12:06,078 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x7b46b68a connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:12:06,082 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
2019-01-03 09:12:06,087 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3680dc027f70713
2019-01-03 09:13:11,034 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6359e1f1 connecting to ZooKeeper ensemble=hadoop05.ddxq.idc:2181,hadoop03.ddxq.idc:2181,hadoop04.ddxq.idc:2181
No error messages.
I guess this issue is that I use Impala frequently. I reinstalled cluster today. At the beginning, the cluster was healthy. when I use Impala to load large amounts of data into kudu, the issue recurrence.
I think the reason may be caused by Impala generats a large number of logs that the agent cannot handle.I think one way is to increase the memory of the agent. Another way is whether there is a configuration to increase the processing capability of the agent side to the Impala log.I searched but not found how to fix. Could you give me some suggestion? Thanks.
 

avatar
Master Guru

@zy001,

 

I agree that it is possible (and even likely) that the issue is more on the agent side.  What we really need to see is what the agent is doing when at the time it is having trouble connecting.  A great way to have a peek at the internals is to send a SIGQUIT signal to the agent which will trigger it to dump thread stacks to the agent.  If you could run this a few times while CM is showing that the agent is out of contact with the host monitor, it might give us some clues if the agent is under stress at that time or not.

avatar
Master Guru

Oops... clicked "POST" before telling you how to get the agent to dump the thread stacks to the agent log.  You can run the following:

 

kill -SIGQUIT `cat /var/run/cloudera-scm-agent/cloudera-scm-agent.pid`

 

This will not cause the agent to restart or anything so it won't impact processing.

If you can run the kill -SIGQUIT a few times that would give us an idea of how the threads are progressing.

avatar
Explorer

@bgooleyI ran this command.

Then the log in /var/log/cloudera-scm-agent/cloudera-scm-agent.log as follows:

 

Dumping all Thread Stacks ...

# Thread: Monitor-GenericMonitor(140497188263680)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496701748992)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140495628007168)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496676570880)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: CP Server Thread-9(140497741920000)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496651392768)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140495091136256)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: MonitorDaemon-Reporter(140497213441792)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 50, in run
  self._fn(*self._args, **self._kwargs)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/__init__.py", line 163, in _report
  self._report_for_monitors(monitors)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/__init__.py", line 214, in _report_for_monitors
  self.firehoses.send_smon_update(service_updates, role_updates, None)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehoses.py", line 149, in send_smon_update
  impala_query_updates=impala_query_updates)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehoses.py", line 181, in _send_agent_message
  firehose.send(dict(agent_msgs=[agentmsg]))
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehose.py", line 107, in send
  self._send(messages)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/firehose.py", line 124, in _send
  self._requestor.request('sendAgentMessages', dict(messages=UNICODE_SANITIZER(messages)))
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 136, in request
  self.write_call_request(message_name, request_datum, buffer_encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 178, in write_call_request
  self.write_request(message.request, request_datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 182, in write_request
  datum_writer.write(request_datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 770, in write
  self.write_data(self.writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data
  self.write_record(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record
  self.write_data(field.type, datum.get(field.name), encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data
  self.write_record(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record
  self.write_data(field.type, datum.get(field.name), encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 795, in write_data
  self.write_array(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 839, in write_array
  self.write_data(writers_schema.items, item, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data
  self.write_record(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record
  self.write_data(field.type, datum.get(field.name), encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 795, in write_data
  self.write_array(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 839, in write_array
  self.write_data(writers_schema.items, item, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data
  self.write_record(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record
  self.write_data(field.type, datum.get(field.name), encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 799, in write_data
  self.write_union(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 879, in write_union
  self.write_data(writers_schema.schemas[index_of_schema], datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 795, in write_data
  self.write_array(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 839, in write_array
  self.write_data(writers_schema.items, item, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 801, in write_data
  self.write_record(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record
  self.write_data(field.type, datum.get(field.name), encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 795, in write_data
  self.write_array(writers_schema, datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 839, in write_array
  self.write_data(writers_schema.items, item, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data
  raise schema.AvroException(fail_msg)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record
  self.write_data(field.type, datum.get(field.name), encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data
  raise schema.AvroException(fail_msg)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 840, in write_array
  encoder.write_long(0)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data
  raise schema.AvroException(fail_msg)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 889, in write_record
  self.write_data(field.type, datum.get(field.name), encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data
  raise schema.AvroException(fail_msg)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 879, in write_union
  self.write_data(writers_schema.schemas[index_of_schema], datum, encoder)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 804, in write_data
  raise schema.AvroException(fail_msg)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 303, in write_int
  self.write_long(datum);
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 313, in write_long
  self.write(chr(datum))
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/io.py", line 281, in write
  self.writer.write(datum)

# Thread: HTTPServer Thread-2(140498278790912)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 764, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/servers.py", line 187, in _start_http_thread
  self.httpserver.start()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1838, in start
  self.tick()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1950, in tick
  return
File: "/usr/lib64/python2.7/socket.py", line 202, in accept
  sock, addr = self._sock.accept()

# Thread: Monitor-GenericMonitor(140495577650944)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140495586043648)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140495594436352)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: DnsResolutionMonitor(140497221834496)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/stoppable_thread.py", line 34, in run
  time.sleep(sleep)

# Thread: Monitor-GenericMonitor(140495602829056)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496148092672)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496156485376)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-HostMonitor(140497230227200)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: MainThread(140498665850688)
File: "/usr/lib64/cmf/agent/build/env/bin/cmf-agent", line 12, in <module>
  load_entry_point('cmf==5.16.1', 'console_scripts', 'cmf-agent')()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 3127, in main
  main_impl()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 3110, in main_impl
  agent.start()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 852, in start
  self.__issue_heartbeat()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 754, in __issue_heartbeat
  heartbeat_response = self.send_heartbeat(heartbeat)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 1401, in send_heartbeat
  response = self._send_heartbeat(heartbeat)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/agent.py", line 1442, in _send_heartbeat
  response = self.requestor.request('heartbeat', heartbeat_data)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 141, in request
  return self.issue_request(call_request, message_name, request_datum)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 254, in issue_request
  call_response = self.transceiver.transceive(call_request)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 483, in transceive
  result = self.read_framed_message()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 487, in read_framed_message
  response = self.conn.getresponse()
File: "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
  response.begin()
File: "/usr/lib64/python2.7/httplib.py", line 476, in begin
  self.msg = HTTPMessage(self.fp, 0)
File: "/usr/lib64/python2.7/mimetools.py", line 25, in __init__
  rfc822.Message.__init__(self, fp, seekable)
File: "/usr/lib64/python2.7/rfc822.py", line 108, in __init__
  self.readheaders()
File: "/usr/lib64/python2.7/httplib.py", line 315, in readheaders
  line = self.fp.readline(_MAXLINE + 1)
File: "/usr/lib64/python2.7/socket.py", line 476, in readline
  data = self._sock.recv(self._rbufsize)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/__init__.py", line 193, in dumpstacks
  for filename, lineno, name, line in traceback.extract_stack(stack):

# Thread: Monitor-GenericMonitor(140496131307264)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: _TimeoutMonitor(140498287183616)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/process/plugins.py", line 471, in run
  time.sleep(self.interval)

# Thread: Monitor-GenericMonitor(140495611221760)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496114521856)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140495619614464)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496693356288)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: CP Server Thread-7(140497758705408)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140495065958144)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: CredentialManager(140498388784896)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/kt_renewer.py", line 181, in run
  self._trigger.wait(_RENEWAL_PERIOD)
File: "/usr/lib64/python2.7/threading.py", line 361, in wait
  _sleep(delay)

# Thread: Monitor-GenericMonitor(140496164878080)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496659785472)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496139699968)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Metadata-Plugin(140498303969024)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 764, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/__init__.py", line 489, in wrapper
  return fn(self, *args, **kwargs)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/audit/navigator_thread.py", line 168, in _monitor_logs
  time.sleep(event_poll_interval)

# Thread: CP Server Thread-11(140497725134592)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Profile-Plugin(140498295576320)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 764, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/__init__.py", line 489, in wrapper
  return fn(self, *args, **kwargs)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/audit/navigator_thread.py", line 168, in _monitor_logs
  time.sleep(event_poll_interval)

# Thread: Monitor-GenericMonitor(140496684963584)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Audit-Plugin(140498312361728)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 764, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/util/__init__.py", line 489, in wrapper
  return fn(self, *args, **kwargs)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/audit/navigator_thread.py", line 168, in _monitor_logs
  time.sleep(event_poll_interval)

# Thread: Thread-13(140497196656384)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/threadpool-1.2.7-py2.7.egg/threadpool.py", line 147, in run
  request = self._requests_queue.get(True, self._poll_timeout)
File: "/usr/lib64/python2.7/Queue.py", line 177, in get
  self.not_empty.wait(remaining)
File: "/usr/lib64/python2.7/threading.py", line 361, in wait
  _sleep(delay)

# Thread: CP Server Thread-6(140497767098112)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: CP Server Thread-4(140498262005504)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140495057565440)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: CP Server Thread-8(140497750312704)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: Monitor-GenericMonitor(140496668178176)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: CP Server Thread-12(140497238619904)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: CP Server Thread-10(140497733527296)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)

# Thread: ImpalaDaemonQueryMonitoring(140496122914560)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 871, in _check_for_queries
  completed_query_profiles))
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 909, in _get_completed_query_profiles
  return completed_query_ids, completed_query_profiles, True
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 601, in get_completed_queries
  return completed_queries
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 484, in get_completed_queries
  return next_start_datetime, next_last_file_timestamp, completed_query_profiles
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 313, in _get_completed_queries
  return next_start_datetime, latest_file_timestamp, completed_queries
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/event_streamer.py", line 114, in __init__
  self.__filtered_file_list = self.__apply_file_filter()
  return filter_context["file_list"]
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsCommon-0.1-py2.7.egg/clusterstats/common/chain.py", line 25, in __call__
  return True
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 103, in __call__
  return True
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/impalad/query_monitor.py", line 108, in __set_start_offset
  f.set_start_offset(event.get_offset())
  return event
  return event
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/event_reader.py", line 84, in get_prev_event
  return event
  return event, ''.join(prior_data)
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file_line_reader.py", line 86, in get_next_line
  return line
  return data, block_offset
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/ClusterStatsLogStreaming-UNKNOWN-py2.7.egg/clusterstats/log/streaming/file_line_reader.py", line 203, in __read_data_till_next_newline
  return line

# Thread: MonitorDaemon-Scheduler(140497205049088)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.16.1-py2.7.egg/cmf/monitor/wakeable_thread.py", line 34, in run
  self._cv.wait(wait_time)
File: "/usr/lib64/python2.7/threading.py", line 361, in wait
  _sleep(delay)

# Thread: CP Server Thread-5(140497775490816)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()

# Thread: CP Server Thread-3(140498270398208)
File: "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
  self.run()
File: "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/CherryPy-3.2.2-py2.7.egg/cherrypy/wsgiserver/wsgiserver2.py", line 1437, in run
  conn = self.server.requests.get()
File: "/usr/lib64/python2.7/Queue.py", line 168, in get
  self.not_empty.wait()
File: "/usr/lib64/python2.7/threading.py", line 339, in wait
  waiter.acquire()
 
I dump all the logs and hope you help me find the issue.Thanks a lot!
 
 

 

avatar
New Contributor

We are facing the exact similar issue. 

 

Could you please help us understand how did you manage to resolve this issue?

 

thanks,

Pratik

avatar
Master Guru

Hi @AstroPratik ,

 

First, in order for us to provide the best help, we need to make sure we have information about the issue you are observing.  My guess is you are seeing the same health alert in Cloudera Manager, but we also need to confirm you are seeing the same messages in the agent log.

 

If so, please follow the instructions to provide a thread dump via the SIGQUIT signal.  The instructions I provided for the "kill -SIGQUIT" command only work in Cloudera Manager 5.x. If you are using CM 6, you can use the following:

kill -SIGQUIT $(systemctl show -p MainPID cloudera-scm-agent.service 2>/dev/null | cut -d= -f2)

 

If you do run the kill SIGQUIT make sure to run it a couple times so we can compare snapshots AND make sure you get the thread dump when the problem is occurring.

 

NOTE:  After reviewing the previous party's thread dump, it appears that a thread that is spawned to collect information for a diagnostic bundle is slow in processing; the thread that uploads service and host information to the Host and Service Monitor servers also seems to be slow.

 

Since the process of obtaining a diagnostic bundle is something that does not happen often, it is likely that the bundle creation is triggering the old event.

 

There are a number of possible causes for "firehose" trouble, though, so it is important that we understand the facts about your situation before making any judgements.