Support Questions

Find answers, ask questions, and share your expertise

Agent Heartbeat not working with error: ConnectionClosedException Reader read 0 bytes

avatar
Explorer

We are trying to install Cloudera Manager and CDH on our cluster, but unfortunately face some errors.

 

The Error-Log of the agent is:

>>[20/Sep/2018 10:25:06 +0000] 9377 MainThread agent ERROR Heartbeating to node001:7182 failed.
>>Traceback (most recent call last):
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1371, in _send_heartbeat
>> response = self.requestor.request('heartbeat', heartbeat_data)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
>> return self.issue_request(call_request, message_name, request_datum)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
>> call_response = self.transceiver.transceive(call_request)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
>> result = self.read_framed_message()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 491, in read_framed_message
>> framed_message = response_reader.read_framed_message()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 411, in read_framed_message
>> buffer_length = self._read_buffer_length()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 424, in _read_buffer_length
>> raise ConnectionClosedException("Reader read 0 bytes.")
>>ConnectionClosedException: Reader read 0 bytes.

 

I have checked the /etc/hosts and everything that was mentioned in similar cases. Nothing helped, I still get no heartbeat from the nodes.

 

Do you have any clue what I could do next?

Thanks.

8 REPLIES 8

avatar
Contributor

have you verified if 7182 port is open using telnet

avatar
Explorer

yes it is open:

netstat -taupen | grep 7182
tcp        0      0 0.0.0.0:7182            0.0.0.0:*               LISTEN      899        19129      1038/java      

 

I can connect with telnet:

telnet node001 7182
Trying 192.168.193.1...
Connected to node001.
Escape character is '^]'.

avatar
Contributor

Could you verify the troublshooting steps mentioned in below link and see if you need to fix anything at your end

 

https://community.cloudera.com/t5/Cloudera-Manager-Installation/Getting-quot-heartbeat-quot-errors-w...

 

 

avatar
Explorer

I already followed these instructions

1. IP Address misconfiguration:

$ ifconfig -a
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
       inet 192.168.193.1  netmask 255.255.248.0  broadcast 192.168.199.255
       inet6 fe80::a6bf:1ff:fe06:6539  prefixlen 64  scopeid 0x20<link>
       ether a4:bf:01:06:65:39  txqueuelen 1000  (Ethernet)
       RX packets 420251  bytes 120871170 (115.2 MiB)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 501994  bytes 246473312 (235.0 MiB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
       device memory 0x95920000-9593ffff   

eth1: flags=4098<BROADCAST,MULTICAST>  mtu 1500
       ether a4:bf:01:06:65:3a  txqueuelen 1000  (Ethernet)
       RX packets 0  bytes 0 (0.0 B)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 0  bytes 0 (0.0 B)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
       device memory 0x95900000-9591ffff   

ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
       inet 192.168.209.1  netmask 255.255.248.0  broadcast 192.168.215.255
       inet6 fe80::211:7501:178:fc93  prefixlen 64  scopeid 0x20<link>
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
       infiniband 80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
       RX packets 7788  bytes 1968252 (1.8 MiB)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 14515  bytes 2118284 (2.0 MiB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
       inet 127.0.0.1  netmask 255.0.0.0
       inet6 ::1  prefixlen 128  scopeid 0x10<host>
       loop  txqueuelen 1000  (Lokale Schleife)
       RX packets 569168  bytes 252684382 (240.9 MiB)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 569168  bytes 252684382 (240.9 MiB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

2. Firewalls are disabled

iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination          

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination          

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination   

 

3. DNS is misconfigured

$ nslookup node001
Server:         192.168.192.1
Address:        192.168.192.1#53

Name:   node001.ara
Address: 192.168.193.1

avatar
Master Guru

@rseidler,

 

Since the basics are covered, I'll say that the stack trace you provided looks pretty odd and indicates that the agent was reading a reply from Cloudera Manager but before it could complete, the connection went away...

 

The Error-Log of the agent is:

>>[20/Sep/2018 10:25:06 +0000] 9377 MainThread agent ERROR Heartbeating to node001:7182 failed.
>>Traceback (most recent call last):
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1371, in _send_heartbeat
>> response = self.requestor.request('heartbeat', heartbeat_data)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
>> return self.issue_request(call_request, message_name, request_datum)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
>> call_response = self.transceiver.transceive(call_request)
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
>> result = self.read_framed_message()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 491, in read_framed_message
>> framed_message = response_reader.read_framed_message()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 411, in read_framed_message
>> buffer_length = self._read_buffer_length()
>> File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 424, in _read_buffer_length
>> raise ConnectionClosedException("Reader read 0 bytes.")
>>ConnectionClosedException: Reader read 0 bytes.

 

 

We see in the code that this means that no bytes were received:

    421   def _read_buffer_length(self):
    422     read = self.reader.read(BUFFER_HEADER_LENGTH)
    423     if read == '':
    424       raise ConnectionClosedException("Reader read 0 bytes.")

 

This does not appear to be a TCP problem, so I would assert that we will likely find some more information on the Cloudera Manager side.

Please check:

 

/var/log/cloudera-scm-server/cloudera-scm-server.log

 

See if you find messages regarding that host or regarding a problem processing heartbeats.

 

Since this is a "clean" problem, I suspect that Cloudera Manager may not be accepting the heartbeat and should hopefully tell you why.

 

*** NOTE:  If Cloudera doesn't show any information, check to see what server is listening on port 7182 just to make sure it is really CM:

 

# netstat -nap |grep 7182 |grep LISTEN

(note the pid)

# ps aux |grep <pid>

 

For example:

 

# netstat -nap |grep 7182|grep LISTEN

tcp        0      0 0.0.0.0:7182            0.0.0.0:*               LISTEN      28669/java

# ps aux |grep 28669 |grep "cmf.Main"

 

This should return one result which is the CM process.

avatar
Explorer

@bgooley,

 

These are the last bits of the server log 

2018-09-21 09:53:47,602 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2018-09-21 09:53:47,604 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2016-09-21T07:53:47.603Z to reap.
2018-09-21 09:53:47,604 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Wizard is active, not reaping scanners or configurators
2018-09-21 09:54:04,321 INFO avro-servlet-hb-processor-13:com.cloudera.server.common.AgentAvroServlet: (11 skipped) AgentAvroServlet: heartbeat processing stats: average=0ms, min=0ms, max=16ms.
2018-09-21 09:54:50,235 INFO ScmActive-0:com.cloudera.server.cmf.components.ScmActive: (119 skipped) ScmActive completed successfully.
2018-09-21 09:55:04,366 INFO avro-servlet-hb-processor-1:com.cloudera.server.common.AgentAvroServlet: (11 skipped) AgentAvroServlet: heartbeat processing stats: average=0ms, min=0ms, max=16
ms.
2018-09-21 09:55:19,306 INFO agentServer-316:com.cloudera.server.common.MonitoringThreadPool: agentServer: execution stats: average=1125ms, min=0ms, max=5012ms.
2018-09-21 09:55:19,307 INFO agentServer-316:com.cloudera.server.common.MonitoringThreadPool: agentServer: waiting in queue stats: average=0ms, min=0ms, max=8ms.
2018-09-21 09:56:04,417 INFO avro-servlet-hb-processor-13:com.cloudera.server.common.AgentAvroServlet: (11 skipped) AgentAvroServlet: heartbeat processing stats: average=0ms, min=0ms, max=16ms.
2018-09-21 09:57:04,472 INFO avro-servlet-hb-processor-1:com.cloudera.server.common.AgentAvroServlet: (11 skipped) AgentAvroServlet: heartbeat processing stats: average=0ms, min=0ms, max=16
ms.

 

The server seems to skip the heartbeats and I don't know how I can see why. Any clue?

avatar
Master Guru

@rseidler,

 

I think that "skipped" happens because the logging uses throttled logging (only 1 of many such lines are printed.

Are all your agents having trouble heartbeating or just one or two?

 

Maybe take a screen shot of your Hosts tab in CM to give us an idea.  That log snippet doesn't tell me much other than the fact that some clients have sent a heartbeat at some point since CM was started.

avatar
New Contributor

Faced same issue.

Turned out that it's due too enabled AutoTLS, and it's feature of enterprise version only.

it's not obvious from setup tutorial.