Reply
Explorer
Posts: 20
Registered: ‎11-25-2016

Installation failed. Failed to receive heartbeat from agent | cluster setup

Hi,

Sorry for posting a question which already many people have asked but I am still not able to get a hold on this so asking.
I am trying to setup a cluster using cloudera manager ui and running across the following errors :
Installation failed. Failed to receive heartbeat from agent.

  • Ensure that the host's hostname is configured properly.
  • Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).
  • Ensure that ports 9000 and 9001 are not in use on the host being added.
  • Check agent logs in /var/log/cloudera-scm-agent/ on the host being added. (Some of the logs can be found in the installation details).
  • If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that /etc/cloudera-scm-agent/config.ini has use_tls=1 on the host being added. Restart the corresponding agent and click the Retry link here.


My host names are configured properly, ports are also open and the last point doesnt hold as tls encryption is not enabled.

If anybody can help me debug the same.
It is also annoying to the fact that cloudera manager throws such a variety of errors, it should throw exact, precise error on which port is not open etc. Now I am left to play hide and seek with all the points as I have configured all correctly. Seriously cloudera people, you can't just throw the exact error than giving me a generic list?


Explorer
Posts: 20
Registered: ‎11-25-2016

Re: Installation failed. Failed to receive heartbeat from agent | cluster setup

The only error logs i could find on the agent were :
[16/Dec/2016 16:49:49 +0000] 1779 Monitor-HostMonitor throttling_logger ERROR Timeout with args ['ntpdc', '-np']
[16/Dec/2016 16:49:49 +0000] 1779 Monitor-HostMonitor throttling_logger ERROR Failed to collect NTP metrics

Posts: 393
Topics: 1
Kudos: 87
Solutions: 51
Registered: ‎04-22-2014

Re: Installation failed. Failed to receive heartbeat from agent | cluster setup

hello @tarantino,

 

Sorry, I meant to get back to you before.  Each agent sends periodic heartbeats (that contain information about the status of services and other stuff) to Cloudera Manager.  If those heartbeats are not received by Cloudera Manager during the installation, that means that the heartbeat could not reach CM for some reason.

 

To tell why, we need to visit the /var/log/cloudera-scm-server/cloudera-scm-server.log on the host that is not able to heartbeat.

It appears you have already looked at it since you pulled some lines from it.

NTP failures will not prevent the agent from heartbeating.  you might try grepping "heartbeat" in the log to find if there are exceptions thrown when trying to heartbeat to Cloudera Manager.

 

If you can find those and post them, we can probably suggest a cause.

 

Regards,

 

Ben

Posts: 393
Topics: 1
Kudos: 87
Solutions: 51
Registered: ‎04-22-2014

Re: Installation failed. Failed to receive heartbeat from agent | cluster setup

One more thing.  The fact that the "ntpdc -np" command did not retrurn may indicate you do not have ports open on that host or there are other network problems.  You may wish to make sure the firewall is disabled and you can at least telnet from the new host to Cloudera Manager on port 7182 (heartbeat listening port on CM).

Explorer
Posts: 20
Registered: ‎11-25-2016

Re: Installation failed. Failed to receive heartbeat from agent | cluster setup

[ Edited ]

Telnet from all the four hosts to the cloudera manager box at port 7182 is successful. So again, the firewall is disabled.
Regarding the logs, I have been checking 'cloudera-scm-agent' for all the details(on host machines intended to be part of cluster).
I believe you were talking about the server logs and not the agent logs but I didn't find anything in cloudera-scm-server as well. The grep results for cm server are a lot of clutter, fair to say.

Explorer
Posts: 20
Registered: ‎11-25-2016

Re: Installation failed. Failed to receive heartbeat from agent | cluster setup

Pasting relevant server logs here (openstack22,3,2,5 are the agent machines):

2016-12-19 11:25:11,031 INFO NodeConfiguratorThread-24-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack22: Transitioning from REPO_INSTALL (PT0.001S) to REFRESH_METADATA
2016-12-19 11:25:23,308 INFO 1727472332@agentServer-1259:com.cloudera.server.common.MonitoringThreadPool: agentServer: execution stats: average=16ms, min=0ms, max=30ms.
2016-12-19 11:25:23,308 INFO 1727472332@agentServer-1259:com.cloudera.server.common.MonitoringThreadPool: agentServer: waiting in queue stats: average=0ms, min=0ms, max=1ms.
2016-12-19 11:25:39,036 INFO NodeConfiguratorThread-24-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack22: Transitioning from REFRESH_METADATA (PT28.005S) to PACKAGE_INSTALL cloudera-manager-agent
2016-12-19 11:25:40,037 INFO NodeConfiguratorThread-24-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack22: Transitioning from PACKAGE_INSTALL cloudera-manager-agent (PT1.001S) to PACKAGE_INSTALL cloudera-manager-daemons
2016-12-19 11:25:40,037 INFO NodeConfiguratorThread-24-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack22: Transitioning from PACKAGE_INSTALL cloudera-manager-daemons (PT0S) to INSTALL_JCE
2016-12-19 11:25:40,037 INFO NodeConfiguratorThread-24-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack22: Transitioning from INSTALL_JCE (PT0S) to AGENT_CONFIGURE
2016-12-19 11:25:40,037 INFO NodeConfiguratorThread-24-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack22: Transitioning from AGENT_CONFIGURE (PT0S) to AGENT_START
2016-12-19 11:25:40,615 INFO NodeConfiguratorThread-24-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack22: Transitioning from AGENT_START (PT0.578S) to SCRIPT_SUCCESS
2016-12-19 11:25:40,615 INFO NodeConfiguratorThread-24-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack22: Transitioning from SCRIPT_SUCCESS (PT0S) to WAIT_FOR_HEARTBEAT
2016-12-19 11:25:45,250 INFO NodeConfiguratorThread-24-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack5: Transitioning from REFRESH_METADATA (PT35.006S) to PACKAGE_INSTALL cloudera-manager-agent
2016-12-19 11:25:45,251 INFO NodeConfiguratorThread-24-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack5: Transitioning from PACKAGE_INSTALL cloudera-manager-agent (PT0.001S) to PACKAGE_INSTALL cloudera-manager-daemons
2016-12-19 11:25:46,151 INFO NodeConfiguratorThread-24-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack5: Transitioning from PACKAGE_INSTALL cloudera-manager-daemons (PT0.900S) to INSTALL_JCE
2016-12-19 11:25:46,151 INFO NodeConfiguratorThread-24-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack5: Transitioning from INSTALL_JCE (PT0S) to AGENT_CONFIGURE
2016-12-19 11:25:46,151 INFO NodeConfiguratorThread-24-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack5: Transitioning from AGENT_CONFIGURE (PT0S) to AGENT_START
2016-12-19 11:25:46,152 INFO NodeConfiguratorThread-24-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack5: Transitioning from AGENT_START (PT0.001S) to SCRIPT_SUCCESS
2016-12-19 11:25:46,152 INFO NodeConfiguratorThread-24-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack5: Transitioning from SCRIPT_SUCCESS (PT0S) to WAIT_FOR_HEARTBEAT
2016-12-19 11:25:50,221 INFO NodeConfiguratorThread-24-2:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack2: Transitioning from REFRESH_METADATA (PT40.007S) to PACKAGE_INSTALL cloudera-manager-agent
2016-12-19 11:25:51,221 INFO NodeConfiguratorThread-24-2:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack2: Transitioning from PACKAGE_INSTALL cloudera-manager-agent (PT1S) to PACKAGE_INSTALL cloudera-manager-daemons
2016-12-19 11:25:51,221 INFO NodeConfiguratorThread-24-2:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack2: Transitioning from PACKAGE_INSTALL cloudera-manager-daemons (PT0S) to INSTALL_JCE
2016-12-19 11:25:51,221 INFO NodeConfiguratorThread-24-2:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack2: Transitioning from INSTALL_JCE (PT0S) to AGENT_CONFIGURE
2016-12-19 11:25:51,221 INFO NodeConfiguratorThread-24-2:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack2: Transitioning from AGENT_CONFIGURE (PT0S) to AGENT_START
2016-12-19 11:25:52,275 INFO NodeConfiguratorThread-24-2:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack2: Transitioning from AGENT_START (PT1.054S) to SCRIPT_SUCCESS
2016-12-19 11:25:52,275 INFO NodeConfiguratorThread-24-2:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack2: Transitioning from SCRIPT_SUCCESS (PT0S) to WAIT_FOR_HEARTBEAT
2016-12-19 11:25:56,316 INFO NodeConfiguratorThread-24-3:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack3: Transitioning from REFRESH_METADATA (PT46.008S) to PACKAGE_INSTALL cloudera-manager-agent
2016-12-19 11:25:56,316 INFO NodeConfiguratorThread-24-3:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack3: Transitioning from PACKAGE_INSTALL cloudera-manager-agent (PT0S) to PACKAGE_INSTALL cloudera-manager-daemons
2016-12-19 11:25:57,317 INFO NodeConfiguratorThread-24-3:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack3: Transitioning from PACKAGE_INSTALL cloudera-manager-daemons (PT1.001S) to INSTALL_JCE
2016-12-19 11:25:57,317 INFO NodeConfiguratorThread-24-3:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack3: Transitioning from INSTALL_JCE (PT0S) to AGENT_CONFIGURE
2016-12-19 11:25:57,317 INFO NodeConfiguratorThread-24-3:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack3: Transitioning from AGENT_CONFIGURE (PT0S) to AGENT_START
2016-12-19 11:25:58,162 INFO NodeConfiguratorThread-24-3:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack3: Transitioning from AGENT_START (PT0.845S) to SCRIPT_SUCCESS
2016-12-19 11:25:58,162 INFO NodeConfiguratorThread-24-3:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack3: Transitioning from SCRIPT_SUCCESS (PT0S) to WAIT_FOR_HEARTBEAT
2016-12-19 11:26:23,733 INFO 1171453448@agentServer-1260:com.cloudera.server.common.MonitoringThreadPool: agentServer: execution stats: average=16ms, min=0ms, max=30ms.
2016-12-19 11:26:23,733 INFO 1171453448@agentServer-1260:com.cloudera.server.common.MonitoringThreadPool: agentServer: waiting in queue stats: average=0ms, min=0ms, max=1ms.
2016-12-19 11:26:40,809 INFO NodeConfiguratorThread-24-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack22: Setting WAIT_FOR_HEARTBEAT as failed and done state
2016-12-19 11:26:40,809 INFO NodeConfiguratorThread-24-1:net.schmizz.sshj.transport.TransportImpl: Disconnected - BY_APPLICATION
2016-12-19 11:26:46,338 INFO NodeConfiguratorThread-24-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack5: Setting WAIT_FOR_HEARTBEAT as failed and done state
2016-12-19 11:26:46,338 INFO NodeConfiguratorThread-24-0:net.schmizz.sshj.transport.TransportImpl: Disconnected - BY_APPLICATION
2016-12-19 11:26:52,463 INFO NodeConfiguratorThread-24-2:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack2: Setting WAIT_FOR_HEARTBEAT as failed and done state
2016-12-19 11:26:52,464 INFO NodeConfiguratorThread-24-2:net.schmizz.sshj.transport.TransportImpl: Disconnected - BY_APPLICATION
2016-12-19 11:26:58,369 INFO NodeConfiguratorThread-24-3:com.cloudera.server.cmf.node.NodeConfiguratorProgress: openstack3: Setting WAIT_FOR_HEARTBEAT as failed and done state
2016-12-19 11:26:58,369 INFO NodeConfiguratorThread-24-3:net.schmizz.sshj.transport.TransportImpl: Disconnected - BY_APPLICATION

Contributor
Posts: 33
Registered: ‎05-12-2016

Re: Installation failed. Failed to receive heartbeat from agent | cluster setup

Which version of CDH did you install?

 

I had the same problem with CDH 5.7.5. I just omitted this error and moved forward. Heartbeat was ok from CM host tab.

 

Also, check reverse lookup entries, e.g.

host ip-address-here

 

There should be only one entry.

.

Explorer
Posts: 20
Registered: ‎11-25-2016

Re: Installation failed. Failed to receive heartbeat from agent | cluster setup

trying to install cdh 5.9. I can't omit the error and move forward as the installation is being done via cloudera manager ui.( the reason for doing so was that earlier i had done the installation via manual path but the spark jobs were not using all workers and plenty other issues - hearbeat not received etc)

Explorer
Posts: 20
Registered: ‎11-25-2016

Re: Installation failed. Failed to receive heartbeat from agent | cluster setup

@bgooley any pointers?

Explorer
Posts: 20
Registered: ‎11-25-2016

Re: Installation failed. Failed to receive heartbeat from agent | cluster setup

Nestat shows some python/cloudera service running on port 9000 but nothing for port 9001.

netstat -tulnp|grep -w 9000
tcp 0 0 172.19.103.72:9000 0.0.0.0:* LISTEN 32138/python2.7
netstat -tulnp|grep -w 9001
ps -ef | grep python
root 1960 16701 0 19:18 pts/11 00:00:00 grep --color=auto python
talenti+ 3871 1676 0 Oct24 ? 00:01:42 /usr/bin/python3 /usr/bin/update-manager --no-update --no-focus-on-map
root 25996 1 0 Nov01 ? 00:30:16 /usr/lib/cmf/agent/build/env/bin/python /usr/lib/cmf/agent/build/env/bin/supervisord
root 25997 25996 0 Nov01 ? 00:00:00 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-listener -l /var/log/cloudera-scm-agent/cmf_listener.log /run/cloudera-scm-agent/events
root 32138 1 0 18:48 ? 00:00:08 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-agent --package_dir /usr/lib/cmf/service --agent_dir /var/run/cloudera-scm-agent --lib_dir /var/lib/cloudera-scm-agent --logfile /var/log/cloudera-scm-agent/cloudera-scm-agent.log --daemon --comm_name cmf-agent --pidfile /var/run/cloudera-scm-agent/cloudera-scm-agent.pid
root 32305 25996 0 18:48 ? 00:00:02 python2.7 /usr/lib/cmf/agent/build/env/bin/flood


But both the ports are open. So not able to figure out why?
Also, what is the way to test port 9001 is open or not as it can't be checked via telnet i suppose (As no service of ours ran on port 9001)

Announcements