Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

cloudera_scm_agent fails to start

cloudera_scm_agent fails to start

Rising Star

Hi guys

We have a cluster on AWS with EC2 instances 

1 NN (r3.4xlarge)

DN1 DN2 DN3 DN4 (r3.8xlarge)

Ubuntu 12.04.4 LTS

Cloudera Manager CM installation 

CDH 5.8.0

 

We were facing a funny problem since lastweek. We would start a Spark Scala job and within 10 minutes DN2 would be be reachable via SSH and the job would eventually hang.

 

I upgraded DN2 (just one node) to Ubuntu 14.04.5 LTS. After that when I start the cluster , then no CDH components start on this node

- cloudera agent

- datanode

- node manager

- region server

The log file is below. I do see "Connection refused"

 

CLOUDERA-SCM-AGENT

====================

[16/Nov/2016 02:17:45 +0000] 2207 Dummy-1 agent INFO Cleaning up daemon
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO SCM Agent Version: 5.8.1
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Agent Protocol Version: 4
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Using Host ID: 832b2fcf-426f-4fba-a8fe-0cae3e239957
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Using directory: /run/cloudera-scm-agent
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Using supervisor binary path: /usr/lib/cmf/agent/build/env/bin/supervisord
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Neither verify_cert_file nor verify_cert_dir are configured. Not performing validation of server certifica
tes in HTTPS communication. These options can be configured in this agent's config.ini file to enable certificate validation.
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Agent Logging Level: INFO
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO No command line vars
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Missing database jar: /usr/share/java/mysql-connector-java.jar (normal, if you're not using this database
type)
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database
type)
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Found database jar: /usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Agent starting as pid 3573 user root(0) group root(0).
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent WARNING Expected mode 0751 for /run/cloudera-scm-agent but was 0755
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/cgroups
[16/Nov/2016 04:52:01 +0000] 3573 MainThread cgroups INFO Found cgroups subsystem: cpu
[16/Nov/2016 04:52:01 +0000] 3573 MainThread cgroups INFO cgroup pseudofile /tmp/tmpBDmF82/cpu.rt_runtime_us does not exist, skipping
[16/Nov/2016 04:52:01 +0000] 3573 MainThread cgroups INFO Found cgroups subsystem: cpuacct
[16/Nov/2016 04:52:01 +0000] 3573 MainThread cgroups INFO Found cgroups subsystem: memory
[16/Nov/2016 04:52:01 +0000] 3573 MainThread cgroups INFO Found cgroups subsystem: blkio
[16/Nov/2016 04:52:01 +0000] 3573 MainThread cgroups INFO Reusing /run/cloudera-scm-agent/cgroups/memory
[16/Nov/2016 04:52:01 +0000] 3573 MainThread cgroups INFO Reusing /run/cloudera-scm-agent/cgroups/cpu
[16/Nov/2016 04:52:01 +0000] 3573 MainThread cgroups INFO Reusing /run/cloudera-scm-agent/cgroups/cpuacct
[16/Nov/2016 04:52:01 +0000] 3573 MainThread cgroups INFO Reusing /run/cloudera-scm-agent/cgroups/blkio
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Found cgroups capabilities: {'has_memory': True, 'default_memory_limit_in_bytes': -1, 'default_memory_soft
_limit_in_bytes': -1, 'writable_cgroup_dot_procs': True, 'default_cpu_rt_runtime_us': -1, 'has_cpu': True, 'default_blkio_weight': 1000, 'default_cpu_shares': 1024, 'has_cpu
acct': True, 'has_blkio': True}
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Setting up supervisord event monitor.
[16/Nov/2016 04:52:01 +0000] 3573 MainThread filesystem_map INFO Monitored nodev filesystem types: ['nfs', 'nfs4', 'tmpfs']
[16/Nov/2016 04:52:01 +0000] 3573 MainThread filesystem_map INFO Using timeout of 2.000000
[16/Nov/2016 04:52:01 +0000] 3573 MainThread filesystem_map INFO Using join timeout of 0.100000
[16/Nov/2016 04:52:01 +0000] 3573 MainThread filesystem_map INFO Using tolerance of 60.000000
[16/Nov/2016 04:52:01 +0000] 3573 MainThread filesystem_map INFO Local filesystem types whitelist: ['ext2', 'ext3', 'ext4']
[16/Nov/2016 04:52:01 +0000] 3573 MainThread kt_renewer INFO Agent wide credential cache set to /run/cloudera-scm-agent/krb5cc_cm_agent_0
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Using metrics_url_timeout_seconds of 30.000000
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Using task_metrics_timeout_seconds of 5.000000
[16/Nov/2016 04:52:01 +0000] 3573 MainThread agent INFO Using max_collection_wait_seconds of 10.000000
[16/Nov/2016 04:52:01 +0000] 3573 MainThread metrics INFO Importing tasktracker metric schema from file /usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/monitor/tasktracker/schema.json
[16/Nov/2016 04:52:01 +0000] 3573 MainThread ntp_monitor INFO Using timeout of 2.000000
[16/Nov/2016 04:52:02 +0000] 3573 MainThread dns_names INFO Using timeout of 30.000000
[16/Nov/2016 04:52:02 +0000] 3573 MainThread __init__ INFO Created DNS monitor.
[16/Nov/2016 04:52:02 +0000] 3573 MainThread stacks_collection_manager INFO Using max_uncompressed_file_size_bytes: 5242880
[16/Nov/2016 04:52:02 +0000] 3573 MainThread __init__ INFO Importing metric schema from file /usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/monitor/schema.json
[16/Nov/2016 04:52:02 +0000] 3573 MainThread agent INFO Supervised processes will add the following to their environment (in addition to the supervisor's env): {'CDH_PARQUET_HOME': '/usr/lib/parquet', 'JSVC_HOME': '/usr/libexec/bigtop-utils', 'CMF_PACKAGE_DIR': '/usr/lib/cmf/service', 'CDH_HADOOP_BIN': '/usr/bin/hadoop', 'MGMT_HOME': '/usr/share/cmf', 'CDH_IMPALA_HOME': '/usr/lib/impala', 'CDH_YARN_HOME': '/usr/lib/hadoop-yarn', 'CDH_HDFS_HOME': '/usr/lib/hadoop-hdfs', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games', 'CDH_HUE_PLUGINS_HOME': '/usr/lib/hadoop', 'CM_STATUS_CODES': u'STATUS_NONE HDFS_DFS_DIR_NOT_EMPTY HBASE_TABLE_DISABLED HBASE_TABLE_ENABLED JOBTRACKER_IN_STANDBY_MODE YARN_RM_IN_STANDBY_MODE', 'KEYTRUSTEE_KP_HOME': '/usr/share/keytrustee-keyprovider', 'CLOUDERA_ORACLE_CONNECTOR_JAR': '/usr/share/java/oracle-connector-java.jar', 'CDH_SQOOP2_HOME': '/usr/lib/sqoop2', 'KEYTRUSTEE_SERVER_HOME': '/usr/lib/keytrustee-server', 'CDH_MR2_HOME': '/usr/lib/hadoop-mapreduce', 'HIVE_DEFAULT_XML': '/etc/hive/conf.dist/hive-default.xml', 'CLOUDERA_POSTGRESQL_JDBC_JAR': '/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar', 'CDH_KMS_HOME': '/usr/lib/hadoop-kms', 'CDH_HBASE_HOME': '/usr/lib/hbase', 'CDH_SQOOP_HOME': '/usr/lib/sqoop', 'WEBHCAT_DEFAULT_XML': '/etc/hive-webhcat/conf.dist/webhcat-default.xml', 'CDH_OOZIE_HOME': '/usr/lib/oozie', 'CDH_ZOOKEEPER_HOME': '/usr/lib/zookeeper', 'CDH_HUE_HOME': '/usr/lib/hue', 'CLOUDERA_MYSQL_CONNECTOR_JAR': '/usr/share/java/mysql-connector-java.jar', 'CDH_HBASE_INDEXER_HOME': '/usr/lib/hbase-solr', 'CDH_MR1_HOME': '/usr/lib/hadoop-0.20-mapreduce', 'CDH_SOLR_HOME': '/usr/lib/solr', 'CDH_PIG_HOME': '/usr/lib/pig', 'CDH_SENTRY_HOME': '/usr/lib/sentry', 'CDH_CRUNCH_HOME': '/usr/lib/crunch', 'CDH_LLAMA_HOME': '/usr/lib/llama/', 'CDH_HTTPFS_HOME': '/usr/lib/hadoop-httpfs', 'CDH_HADOOP_HOME': '/usr/lib/hadoop', 'CDH_HIVE_HOME': '/usr/lib/hive', 'CDH_HCAT_HOME': '/usr/lib/hive-hcatalog', 'CDH_KAFKA_HOME': '/usr/lib/kafka', 'CDH_SPARK_HOME': '/usr/lib/spark', 'TOMCAT_HOME': '/usr/lib/bigtop-tomcat', 'CDH_FLUME_HOME': '/usr/lib/flume-ng'}
[16/Nov/2016 04:52:02 +0000] 3573 MainThread agent INFO To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[16/Nov/2016 04:52:02 +0000] 3573 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/process
[16/Nov/2016 04:52:02 +0000] 3573 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor
[16/Nov/2016 04:52:02 +0000] 3573 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/flood
[16/Nov/2016 04:52:02 +0000] 3573 MainThread agent INFO Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor/include
[16/Nov/2016 04:52:02 +0000] 3573 MainThread agent ERROR Failed to connect to previous supervisor.
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2039, in find_or_start_supervisor
self.get_supervisor_process_info()
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2185, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 979, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 778, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
[16/Nov/2016 04:52:02 +0000] 3573 MainThread tmpfs INFO Reusing mounted tmpfs at /run/cloudera-scm-agent/process
[16/Nov/2016 04:52:03 +0000] 3573 MainThread agent INFO Trying to connect to newly launched supervisor (Attempt 1)
[16/Nov/2016 04:52:03 +0000] 3573 MainThread agent ERROR Failed! trying again in 1 second(s)
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2163, in connect_to_new_supervisor
self.get_supervisor_process_info()
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2185, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 979, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 778, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
[16/Nov/2016 04:52:04 +0000] 3573 MainThread agent INFO Trying to connect to newly launched supervisor (Attempt 2)
[16/Nov/2016 04:52:04 +0000] 3573 MainThread agent ERROR Failed! trying again in 1 second(s)
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2163, in connect_to_new_supervisor
self.get_supervisor_process_info()
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2185, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 979, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 778, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
[16/Nov/2016 04:52:05 +0000] 3573 MainThread agent INFO Trying to connect to newly launched supervisor (Attempt 3)
[16/Nov/2016 04:52:05 +0000] 3573 MainThread agent ERROR Failed! trying again in 1 second(s)
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2163, in connect_to_new_supervisor
self.get_supervisor_process_info()
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2185, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 979, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 778, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
[16/Nov/2016 04:52:06 +0000] 3573 MainThread agent INFO Trying to connect to newly launched supervisor (Attempt 4)
[16/Nov/2016 04:52:06 +0000] 3573 MainThread agent ERROR Failed! trying again in 1 second(s)
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2163, in connect_to_new_supervisor
self.get_supervisor_process_info()
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2185, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 979, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 778, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
[16/Nov/2016 04:52:07 +0000] 3573 MainThread agent INFO Trying to connect to newly launched supervisor (Attempt 5)
[16/Nov/2016 04:52:07 +0000] 3573 MainThread agent ERROR Failed! trying again in 1 second(s)
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2163, in connect_to_new_supervisor
self.get_supervisor_process_info()
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2185, in get_supervisor_process_info
self.identifier = self.supervisor_client.supervisor.getIdentification()
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
self.connection.request('POST', handler, request_body, self.headers)
File "/usr/lib/python2.7/httplib.py", line 979, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 778, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
[16/Nov/2016 04:52:07 +0000] 3573 MainThread agent ERROR Failed to connect to newly launched supervisor. Agent will exit
[16/Nov/2016 04:52:07 +0000] 3573 MainThread agent INFO Stopping agent...
[16/Nov/2016 04:52:07 +0000] 3573 MainThread agent INFO No extant cgroups; unmounting any cgroup roots
[16/Nov/2016 04:52:07 +0000] 3573 MainThread agent INFO Cleaning up daemon
[16/Nov/2016 04:52:07 +0000] 3573 Dummy-1 agent INFO Stopping agent...
[16/Nov/2016 04:52:07 +0000] 3573 Dummy-1 agent INFO No extant cgroups; unmounting any cgroup roots
[16/Nov/2016 04:52:07 +0000] 3573 Dummy-1 agent ERROR Shutdown callback failed.
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2777, in stop
f()
File "/usr/lib/python2.7/asyncore.py", line 409, in close
self.socket.close()
File "/usr/lib/python2.7/asyncore.py", line 636, in close
os.close(self.fd)
OSError: [Errno 9] Bad file descriptor
[16/Nov/2016 04:52:07 +0000] 3573 Dummy-1 agent INFO Cleaning up daemon

2 REPLIES 2
Highlighted

Re: cloudera_scm_agent fails to start

Cloudera Employee

Hi sanjumani,

 

 

The Cloudera Manager Agent is actually consisting of two parts:

- the actual CM Agent, which is responsible for communicating with CM

- and supervisord, a supervisor process that "owns" the processes that are started by CM on a host via the agent.

 

with this split archtecture, hadoop services can keep running even if you need to restart or upgrade the agent.

 

When the Agent starts up, it first checks if there is an existing supervisord alive, by trying to connect to it.

If this is the first start after a reboot, then there will be no supervisord running. so the first connection refused message:

 

[16/Nov/2016 04:52:02 +0000] 3573 MainThread agent ERROR Failed to connect to previous supervisor.
Traceback (most recent call last):

 

is valid. After this, the Agent will try to launch a new supervisord, and coonect to that. That's where you are failing actually:

[16/Nov/2016 04:52:03 +0000] 3573 MainThread agent ERROR Failed! trying again in 1 second(s)
Traceback (most recent call last):

 

and after five unsuccessful attempts, the agent will give up and shuts down itself.

 

there should be a supervisord log beside the agent log too. I would check that to see if it is failing to start up, or there are something else happened during the upgrade.

 

 

cheers,

zegab

 

 

 

Highlighted

Re: cloudera_scm_agent fails to start

Super Guru

Make sure you can connect to locahost on port 19001 (the supervisord listening port) and that the supervisor did start up and listen on port 19001.  If you did an OS upgrade, perhaps things like "iptables" and "selinux" are interfering here.  Defining whether the supervisor is not starting or if it starts but the agent cannot find a route to communicate with it is an important troubleshooting step.

 

Ben

Don't have an account?
Coming from Hortonworks? Activate your account here