Support Questions
Find answers, ask questions, and share your expertise

Installation failed. Failed to receive heartbeat from agent.

Explorer

Hi all,

 

i'm trying to install Cloudera Express 5.0.1 on my Linux machines. Currently I have a Linux host that has cloudera manager installed. I have two VMs running on the this host, one of which I want to make a cloudera server and the other a slave. While trying to do an automatic install, I get the following error:

 

Spoiler
 Installation failed. Failed to receive heartbeat from agent.

 

 

 The agent logs on the master-to-be show the following:

 

Spoiler

27/May/2014 18:13:33 +0000] 12809 MainThread agent INFO Stopping agent...

[27/May/2014 18:13:33 +0000] 12809 MainThread agent INFO No extant cgroups; unmounting any cgroup roots
[27/May/2014 18:13:33 +0000] 12809 MainThread agent INFO No processes are being managed; Supervisor will shutdown.
[27/May/2014 18:13:33 +0000] 12809 MainThread agent INFO Shutting down supervisord, pid 12833
[27/May/2014 18:13:34 +0000] 12809 MonitorDaemon-Reporter __init__ INFO Couldn't get supervisord metrics: process no longer exists (pid=12833)
[27/May/2014 18:13:34 +0000] 12809 MainThread agent INFO waiting for process to terminate...
[27/May/2014 18:13:34 +0000] 12809 MainThread agent INFO Successfully killed process with pid 12833
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE Bus STOPPING
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('hadoop-master.cccs.uwe.ac.uk', 9000)) shut down
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE Stopped thread '_TimeoutMonitor'.
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE Bus STOPPED
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE Bus STOPPING
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('hadoop-master.cccs.uwe.ac.uk', 9000)) already shut down
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE No thread running for None.
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE Bus STOPPED
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE Bus EXITING
[27/May/2014 18:13:34 +0000] 12809 MainThread _cplogging INFO [27/May/2014:18:13:34] ENGINE Bus EXITED
[27/May/2014 18:13:34 +0000] 12809 MainThread agent INFO Agent exiting; caught signal 15
[27/May/2014 18:13:34 +0000] 13591 MainThread agent INFO No command line vars
[27/May/2014 18:13:34 +0000] 13591 MainThread agent INFO Missing database jar: /usr/share/java/mysql-connector-java.jar (normal, if you're not using this database type)
[27/May/2014 18:13:34 +0000] 13591 MainThread agent INFO Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database type)
[27/May/2014 18:13:34 +0000] 13591 MainThread agent INFO Found database jar: /usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
[27/May/2014 18:13:34 +0000] 13591 MainThread agent INFO Agent starting as pid 13591 user root(0) group root(0).
[27/May/2014 18:13:34 +0000] 13591 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/cgroups
[27/May/2014 18:13:36 +0000] 13591 MainThread cgroups INFO cgroup pseudofile /tmp/tmprPEj0w/cpu.rt_runtime_us does not exist, skipping
[27/May/2014 18:13:36 +0000] 13591 MainThread cgroups INFO Failed to mount cgroups subsystem memory to /tmp/tmp3FjlAD, rc: 32 stderr: mount: special device cm_cgroups does not exist

[27/May/2014 18:13:36 +0000] 13591 MainThread cgroups INFO Reusing /run/cloudera-scm-agent/cgroups/cpu
[27/May/2014 18:13:36 +0000] 13591 MainThread cgroups INFO Reusing /run/cloudera-scm-agent/cgroups/cpuacct
[27/May/2014 18:13:36 +0000] 13591 MainThread cgroups INFO Reusing /run/cloudera-scm-agent/cgroups/blkio
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO Found cgroups capabilities: {'has_memory': False, 'default_memory_limit_in_bytes': -1, 'default_blkio_weight': 1000, 'writable_cgroup_dot_procs': True, 'default_cpu_rt_runtime_us': -1, 'has_cpu': True, 'default_memory_soft_limit_in_bytes': -1, 'has_cpuacct': True, 'default_cpu_shares': 1024, 'has_blkio': True}
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO Setting up supervisord event monitor.
[27/May/2014 18:13:36 +0000] 13591 MainThread filesystem_map INFO Monitored nodev filesystem types: ['nfs', 'nfs4', 'tmpfs']
[27/May/2014 18:13:36 +0000] 13591 MainThread filesystem_map INFO Using timeout of 2.000000
[27/May/2014 18:13:36 +0000] 13591 MainThread filesystem_map INFO Using join timeout of 0.100000
[27/May/2014 18:13:36 +0000] 13591 MainThread filesystem_map INFO Using tolerance of 60.000000
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO Using metrics_url_timeout_seconds of 30.000000
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO Using task_metrics_timeout_seconds of 5.000000
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO Using max_collection_wait_seconds of 10.000000
[27/May/2014 18:13:36 +0000] 13591 MainThread metrics INFO Importing tasktracker metric schema from file /usr/lib/cmf/agent/src/cmf/monitor/tasktracker/schema.json
[27/May/2014 18:13:36 +0000] 13591 MainThread dns_names INFO Using timeout of 2.000000
[27/May/2014 18:13:36 +0000] 13591 MainThread ntp_monitor INFO Using timeout of 2.000000
[27/May/2014 18:13:36 +0000] 13591 MainThread __init__ INFO Importing metric schema from file /usr/lib/cmf/agent/src/cmf/monitor/schema.json
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO Supervised processes will add the following to their environment (in addition to the supervisor's env): {'CDH_PARQUET_HOME': '/usr/lib/parquet', 'CDH_OOZIE_HOME': '/usr/lib/oozie', 'CDH_MR2_HOME': '/usr/lib/hadoop-mapreduce', 'CDH_ZOOKEEPER_HOME': '/usr/lib/zookeeper', 'CDH_HADOOP_BIN': '/usr/bin/hadoop', 'MGMT_HOME': '/usr/share/cmf', 'CDH_IMPALA_HOME': '/usr/lib/impala', 'CLOUDERA_MYSQL_CONNECTOR_JAR': '/usr/share/java/mysql-connector-java.jar', 'CDH_YARN_HOME': '/usr/lib/hadoop-yarn', 'CMF_PACKAGE_DIR': '/usr/lib/cmf/service', 'CDH_SPARK_HOME': '/usr/lib/spark', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11', 'CDH_HDFS_HOME': '/usr/lib/hadoop-hdfs', 'CDH_SOLR_HOME': '/usr/lib/solr', 'CDH_PIG_HOME': '/usr/lib/pig', 'CDH_SQOOP2_HOME': '/usr/lib/sqoop2', 'CDH_HUE_PLUGINS_HOME': '/usr/lib/hadoop', 'CM_STATUS_CODES': u'STATUS_NONE HDFS_DFS_DIR_NOT_EMPTY HBASE_TABLE_DISABLED HBASE_TABLE_ENABLED JOBTRACKER_IN_STANDBY_MODE YARN_RM_IN_STANDBY_MODE', 'CDH_MR1_HOME': '/usr/lib/hadoop-0.20-mapreduce', 'CLOUDERA_ORACLE_CONNECTOR_JAR': '/usr/share/java/oracle-connector-java.jar', 'CDH_HUE_HOME': '/usr/lib/hue', 'CDH_CRUNCH_HOME': '/usr/lib/crunch', 'CDH_HIVE_HOME': '/usr/lib/hive', 'CDH_HTTPFS_HOME': '/usr/lib/hadoop-httpfs', 'CDH_HADOOP_HOME': '/usr/lib/hadoop', 'JSVC_HOME': '/usr/libexec/bigtop-utils', 'HIVE_DEFAULT_XML': '/etc/hive/conf.dist/hive-default.xml', 'WEBHCAT_DEFAULT_XML': '/etc/hive-webhcat/conf.dist/webhcat-default.xml', 'CLOUDERA_POSTGRESQL_JDBC_JAR': '/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar', 'CDH_HBASE_INDEXER_HOME': '/usr/lib/hbase-solr', 'CDH_FLUME_HOME': '/usr/lib/flume-ng', 'TOMCAT_HOME': '/usr/lib/bigtop-tomcat', 'CDH_HBASE_HOME': '/usr/lib/hbase', 'CDH_SQOOP_HOME': '/usr/lib/sqoop', 'CDH_HCAT_HOME': '/usr/lib/hive-hcatalog', 'CDH_LLAMA_HOME': '/usr/lib/llama/'}
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/process
[27/May/2014 18:13:36 +0000] 13591 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor
"/var/log/cloudera-scm-agent/cloudera-scm-agent.log" [readonly] 105L, 12540C
[27/May/2014 18:13:36 +0000] 13591 MainThread agent ERROR Failed to connect to previous supervisor.
Traceback (most recent call last):
File "/usr/lib/cmf/agent/src/cmf/agent.py", line 1236, in find_or_start_supervisor
self.get_supervisor_process_info()
File "/usr/lib/cmf/agent/src/cmf/agent.py", line 1423, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1578, in __request
verbose=self.__verbose
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 962, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 996, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 958, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 818, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 780, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 761, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
[27/May/2014 18:13:36 +0000] 13591 MainThread tmpfs INFO Reusing mounted tmpfs at /run/cloudera-scm-agent/process
[27/May/2014 18:13:38 +0000] 13591 MainThread agent INFO Trying to connect to newly launched supervisor (Attempt 1)
[27/May/2014 18:13:38 +0000] 13591 MainThread agent INFO Successfully connected to supervisor
[27/May/2014 18:13:38 +0000] 13591 MainThread _cplogging INFO [27/May/2014:18:13:38] ENGINE Bus STARTING
[27/May/2014 18:13:38 +0000] 13591 MainThread _cplogging INFO [27/May/2014:18:13:38] ENGINE Started monitor thread '_TimeoutMonitor'.
[27/May/2014 18:13:38 +0000] 13591 MainThread _cplogging INFO [27/May/2014:18:13:38] ENGINE Serving on hadoop-master.cccs.uwe.ac.uk:9000
[27/May/2014 18:13:38 +0000] 13591 MainThread _cplogging INFO [27/May/2014:18:13:38] ENGINE Bus STARTED
[27/May/2014 18:13:38 +0000] 13591 MainThread __init__ INFO New monitor: (<cmf.monitor.host.HostMonitor object at 0x1c86950>,)
[27/May/2014 18:13:38 +0000] 13591 MainThread agent WARNING Setting default socket timeout to 30!
[27/May/2014 18:13:38 +0000] 13591 MonitorDaemon-Scheduler __init__ INFO Monitor ready to report: ('HostMonitor',)
[27/May/2014 18:13:38 +0000] 13591 MainThread agent INFO Using parcels directory from server provided value: /opt/cloudera/parcels
[27/May/2014 18:13:38 +0000] 13591 MainThread parcel INFO Agent does create users/groups and apply file permissions
[27/May/2014 18:13:38 +0000] 13591 MainThread downloader INFO Downloader path: /opt/cloudera/parcel-cache
[27/May/2014 18:13:38 +0000] 13591 MainThread parcel_cache INFO Using /opt/cloudera/parcel-cache for parcel cache
[27/May/2014 18:13:38 +0000] 13591 MainThread agent INFO Active parcel list updated; recalculating component info.
[27/May/2014 18:13:43 +0000] 13591 Monitor-HostMonitor throttling_logger INFO Using java location: '/usr/lib/jvm/java-7-oracle-cloudera/bin/java'.
[27/May/2014 18:13:43 +0000] 13591 Monitor-HostMonitor throttling_logger ERROR Failed to collect NTP metrics
Traceback (most recent call last):
File "/usr/lib/cmf/agent/src/cmf/monitor/host/ntp_monitor.py", line 39, in collect
result, stdout, stderr = self._subprocess_with_timeout(args, self._timeout)
File "/usr/lib/cmf/agent/src/cmf/monitor/host/ntp_monitor.py", line 32, in _subprocess_with_timeout
return subprocess_with_timeout(args, timeout)
File "/usr/lib/cmf/agent/src/cmf/monitor/host/subprocess_timeout.py", line 40, in subprocess_with_timeout
close_fds=True)
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1259, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

I did a forum search for this topic and other people have had problems, but their logs files are different from mine. Any idea what's going here?

4 REPLIES 4

Re: Installation failed. Failed to receive heartbeat from agent.

Explorer
No replies?

Re: Installation failed. Failed to receive heartbeat from agent.

Super Guru

Hello KS,

 

It's hard to tell what is happening based on your logs, but, in general, this means that the agent is unable to tell Cloudera Manager it is alive.  There could be a number of reasons this is happening and the log snippet you provided does not clearly indicate what may be the major contributing factor.

 

One approach might be to try running Host Inspector:

 

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Managing-...

 

If you log into Cloudera Manager and then click on the "Cloudera Manager" logo in the top left and then click the "Hosts" link in the menu at the top of the page, do you see your host listed?  If so, Click the Host Inspector button.

 

Depending on the results there, you can check the stderr and stdout for individual commands or download the results as a JSON file.  I suggest clickin the the "Show Inspector Results" button to see if any checks have failed.  If so, which ones?

 

If you have iptables enabled, you may want to shut that down and run the inspector again to see if that resolves the problem.  The firewall could be blocking heartbeat communication.

 

Re: Installation failed. Failed to receive heartbeat from agent.

Explorer

Hi bgooley,

 

Thank you for your reply. I did as you suggested and here are the stderr and stdout from the inspection:

 

stderr:

+ DDL_DIR=/usr/share/cmf/schema
+ [[ inspector == \f\i\r\e\h\o\s\e ]]
+ [[ inspector == \e\v\e\n\t\s\e\r\v\e\r ]]
+ [[ inspector == \a\l\e\r\t\p\u\b\l\i\s\h\e\r ]]
+ [[ inspector == \h\e\a\d\l\a\m\p ]]
+ [[ inspector == \i\n\s\p\e\c\t\o\r ]]
+ shift
++ pwd
+ MGMT_CLASSPATH='/run/cloudera-scm-agent/process/5-host-inspector:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/*'
+ echo_and_exec /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:+UseParNewGC -Dmgmt.log.file= -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -cp '/run/cloudera-scm-agent/process/5-host-inspector:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/*' com.cloudera.cmf.inspector.Inspector input.json output.json DEFAULT
+ echo 'Executing: /usr/lib/jvm/java-7-oracle-cloudera/bin/java' -server -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:+UseParNewGC -Dmgmt.log.file= -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -cp '/run/cloudera-scm-agent/process/5-host-inspector:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/*' com.cloudera.cmf.inspector.Inspector input.json output.json DEFAULT
+ exec /usr/lib/jvm/java-7-oracle-cloudera/bin/java -server -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:+UseParNewGC -Dmgmt.log.file= -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -cp '/run/cloudera-scm-agent/process/5-host-inspector:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/*' com.cloudera.cmf.inspector.Inspector input.json output.json DEFAULT

 

stdout:

Spoiler
[                          main] ExtantInitdInspection          INFO  Could not list directory /etc/init.d/rc2.d
[                          main] ExtantInitdInspection          WARN  Skipping file README.
[                          main] ExtantInitdInspection          INFO  Could not list directory /etc/init.d/rc3.d
[                          main] ExtantInitdInspection          WARN  Skipping file README.
[                          main] ExtantInitdInspection          INFO  Could not list directory /etc/init.d/rc4.d
[                          main] ExtantInitdInspection          WARN  Skipping file README.
[                          main] ExtantInitdInspection          INFO  Could not list directory /etc/init.d/rc5.d
[                          main] Inspector                      INFO  Running inspection: com.cloudera.cmf.inspector.TransparentHugePagesInspection@9896c52
[                          main] Inspector                      INFO  Running inspection: com.cloudera.cmf.inspector.SwappinessInspection@1d268062
{
  "allHostDnsErrors" : [ ],
  "allHostDnsSuccesses" : [ 1 ],
  "allHostsDnsAvgDurationMillis" : 1,
  "allHostsDnsCount" : 2,
  "allHostsDnsMaxDurationMillis" : 2,
  "etcHostsError" : null,
  "etcHostsMessages" : [ ],
  "etcKrbConfMessages" : [ ],
  "extantInitdErrors" : [ ],
  "groupData" : "cloudera-scm:x:110:\n",
  "hostDnsErrors" : [ ],
  "hostname" : "hadoop-master.cccs.uwe.ac.uk",
  "jceStrength" : 0,
  "kernelVersion" : "3.2.0-4-amd64",
  "kernelVersionException" : null,
  "localHostIpError" : null,
  "localhostIp" : "127.0.0.1",
  "nowMillis" : 1401266251008,
  "rhelRelease" : null,
  "runExceptions" : [ ],
  "swappiness" : "60",
  "swappinessException" : null,
  "timeZone" : "UTC+00:00",
  "transparentHugePagesDefrag" : null,
  "transparentHugePagesEnabled" : null,
  "transparentHugePagesException" : null,
  "userData" : "cloudera-scm:x:108:110:Cloudera Manager,,,:/var/run/cloudera-scm-server:/bin/nologin\n"
}

 

The JSON inspection results can be found here.

 

Re: Installation failed. Failed to receive heartbeat from agent.

Super Guru

Hi KS,

 

I didn't see anything in the stderr and stdout, but the JSON output did show this:

 

 
Spoiler
  "failedHostsWithError" : {
    "myhost" : "IOException thrown while collecting data from host: myhost"
  },

 

 
 
That indicates that the Manager was unable to connect to the Agent.  So, we should focus on connectivity.
 
Check things like:
 
* Can you ping myhost from myhost?
* Does the same problem happen if iptables is shut off?
* Is the agent running?
 
There may be other issues, but that is a good start and often where the install can have problems.
 
Regards,
 
Ben