Reply
New Contributor
Posts: 1
Registered: ‎04-25-2018

Host monitor not running; unable to restart cloudera-scm-agent due to supervisor connection refused

[ Edited ]

After a data center outage, my Cloudera cluster running CDH 5.8.5 was not functioning properly, so I took these steps:

  • Logged into Cloudera Manager which said that host monitors and service monitors were not reachable.
  • SSH-ed into nodes of the cluster, cleared log files for cloudera-scm-agent and supervisord, then attempted to restart cloudera-scm-agent (service cloudera-scm-agent restart)
  • cloudera-scm-agent log is shown below. It indicates a connection refused when connecting to the supervisor process.
  • supervisord produced no new log messages during this time.

Any help would be appreciated here. Thank you!

[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     SCM Agent Version: 5.8.2
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Agent Protocol Version: 4
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Using Host ID: 842ca2b2-6e0a-422b-be27-1f8270defb60
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Using directory: /run/cloudera-scm-agent
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Using supervisor binary path: /usr/lib/cmf/agent/build/env/bin/supervisord
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Neither verify_cert_file nor verify_cert_dir are configured. Not performing validation of server certificates in HTTPS communication. These options can be configured in this agent's config.ini file to enable certificate validation.
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Agent Logging Level: INFO
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     No command line vars
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Found database jar: /usr/share/java/mysql-connector-java.jar
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database type)
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Found database jar: /usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
[25/Apr/2018 17:25:41 +0000] 3508 MainThread agent        INFO     Agent starting as pid 3508 user root(0) group root(0).
[25/Apr/2018 17:25:43 +0000] 3508 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/cgroups
[25/Apr/2018 17:25:43 +0000] 3508 MainThread cgroups      INFO     Found cgroups subsystem: cpu
[25/Apr/2018 17:25:43 +0000] 3508 MainThread cgroups      INFO     cgroup pseudofile /tmp/tmpmddUFx/cpu.rt_runtime_us does not exist, skipping
[25/Apr/2018 17:25:43 +0000] 3508 MainThread cgroups      INFO     Found cgroups subsystem: cpuacct
[25/Apr/2018 17:25:43 +0000] 3508 MainThread cgroups      INFO     Found cgroups subsystem: memory
[25/Apr/2018 17:25:43 +0000] 3508 MainThread cgroups      INFO     Found cgroups subsystem: blkio
[25/Apr/2018 17:25:43 +0000] 3508 MainThread cgroups      INFO     Reusing /run/cloudera-scm-agent/cgroups/memory
[25/Apr/2018 17:25:43 +0000] 3508 MainThread cgroups      INFO     Reusing /run/cloudera-scm-agent/cgroups/cpu
[25/Apr/2018 17:25:43 +0000] 3508 MainThread cgroups      INFO     Reusing /run/cloudera-scm-agent/cgroups/cpuacct
[25/Apr/2018 17:25:43 +0000] 3508 MainThread cgroups      INFO     Reusing /run/cloudera-scm-agent/cgroups/blkio
[25/Apr/2018 17:25:43 +0000] 3508 MainThread agent        INFO     Found cgroups capabilities: {'has_memory': True, 'default_memory_limit_in_bytes': -1, 'default_memory_soft_limit_in_bytes': -1, 'writable_cgroup_dot_procs': True, 'default_cpu_rt_runtime_us': -1, 'has_cpu': True, 'default_blkio_weight': 1000, 'default_cpu_shares': 1024, 'has_cpuacct': True, 'has_blkio': True}
[25/Apr/2018 17:25:43 +0000] 3508 MainThread agent        INFO     Setting up supervisord event monitor.
[25/Apr/2018 17:25:43 +0000] 3508 MainThread filesystem_map INFO     Monitored nodev filesystem types: ['nfs', 'nfs4', 'tmpfs']
[25/Apr/2018 17:25:43 +0000] 3508 MainThread filesystem_map INFO     Using timeout of 2.000000
[25/Apr/2018 17:25:43 +0000] 3508 MainThread filesystem_map INFO     Using join timeout of 0.100000
[25/Apr/2018 17:25:43 +0000] 3508 MainThread filesystem_map INFO     Using tolerance of 60.000000
[25/Apr/2018 17:25:43 +0000] 3508 MainThread filesystem_map INFO     Local filesystem types whitelist: ['ext2', 'ext3', 'ext4']
[25/Apr/2018 17:25:43 +0000] 3508 MainThread kt_renewer   INFO     Agent wide credential cache set to /run/cloudera-scm-agent/krb5cc_cm_agent_0
[25/Apr/2018 17:25:43 +0000] 3508 MainThread agent        INFO     Using metrics_url_timeout_seconds of 30.000000
[25/Apr/2018 17:25:43 +0000] 3508 MainThread agent        INFO     Using task_metrics_timeout_seconds of 5.000000
[25/Apr/2018 17:25:43 +0000] 3508 MainThread agent        INFO     Using max_collection_wait_seconds of 10.000000
[25/Apr/2018 17:25:43 +0000] 3508 MainThread metrics      INFO     Importing tasktracker metric schema from file /usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/monitor/tasktracker/schema.json
[25/Apr/2018 17:25:43 +0000] 3508 MainThread ntp_monitor  INFO     Using timeout of 2.000000
[25/Apr/2018 17:25:44 +0000] 3508 MainThread dns_names    INFO     Using timeout of 30.000000
[25/Apr/2018 17:25:44 +0000] 3508 MainThread __init__     INFO     Created DNS monitor.
[25/Apr/2018 17:25:44 +0000] 3508 MainThread stacks_collection_manager INFO     Using max_uncompressed_file_size_bytes: 5242880
[25/Apr/2018 17:25:44 +0000] 3508 MainThread __init__     INFO     Importing metric schema from file /usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/monitor/schema.json
[25/Apr/2018 17:25:44 +0000] 3508 MainThread agent        INFO     Supervised processes will add the following to their environment (in addition to the supervisor's env): {'CDH_PARQUET_HOME': '/usr/lib/parquet', 'JSVC_HOME': '/usr/libexec/bigtop-utils', 'CMF_PACKAGE_DIR': '/usr/lib/cmf/service', 'CDH_HADOOP_BIN': '/usr/bin/hadoop', 'MGMT_HOME': '/usr/share/cmf', 'CDH_IMPALA_HOME': '/usr/lib/impala', 'CDH_YARN_HOME': '/usr/lib/hadoop-yarn', 'CDH_HDFS_HOME': '/usr/lib/hadoop-hdfs', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games', 'CDH_HUE_PLUGINS_HOME': '/usr/lib/hadoop', 'CM_STATUS_CODES': u'STATUS_NONE HDFS_DFS_DIR_NOT_EMPTY HBASE_TABLE_DISABLED HBASE_TABLE_ENABLED JOBTRACKER_IN_STANDBY_MODE YARN_RM_IN_STANDBY_MODE', 'KEYTRUSTEE_KP_HOME': '/usr/share/keytrustee-keyprovider', 'CLOUDERA_ORACLE_CONNECTOR_JAR': '/usr/share/java/oracle-connector-java.jar', 'CDH_SQOOP2_HOME': '/usr/lib/sqoop2', 'KEYTRUSTEE_SERVER_HOME': '/usr/lib/keytrustee-server', 'CDH_MR2_HOME': '/usr/lib/hadoop-mapreduce', 'HIVE_DEFAULT_XML': '/etc/hive/conf.dist/hive-default.xml', 'CLOUDERA_POSTGRESQL_JDBC_JAR': '/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar', 'CDH_KMS_HOME': '/usr/lib/hadoop-kms', 'CDH_HBASE_HOME': '/usr/lib/hbase', 'CDH_SQOOP_HOME': '/usr/lib/sqoop', 'WEBHCAT_DEFAULT_XML': '/etc/hive-webhcat/conf.dist/webhcat-default.xml', 'CDH_OOZIE_HOME': '/usr/lib/oozie', 'CDH_ZOOKEEPER_HOME': '/usr/lib/zookeeper', 'CDH_HUE_HOME': '/usr/lib/hue', 'CLOUDERA_MYSQL_CONNECTOR_JAR': '/usr/share/java/mysql-connector-java.jar', 'CDH_HBASE_INDEXER_HOME': '/usr/lib/hbase-solr', 'CDH_MR1_HOME': '/usr/lib/hadoop-0.20-mapreduce', 'CDH_SOLR_HOME': '/usr/lib/solr', 'CDH_PIG_HOME': '/usr/lib/pig', 'CDH_SENTRY_HOME': '/usr/lib/sentry', 'CDH_CRUNCH_HOME': '/usr/lib/crunch', 'CDH_LLAMA_HOME': '/usr/lib/llama/', 'CDH_HTTPFS_HOME': '/usr/lib/hadoop-httpfs', 'CDH_HADOOP_HOME': '/usr/lib/hadoop', 'CDH_HIVE_HOME': '/usr/lib/hive', 'CDH_HCAT_HOME': '/usr/lib/hive-hcatalog', 'CDH_KAFKA_HOME': '/usr/lib/kafka', 'CDH_SPARK_HOME': '/usr/lib/spark', 'TOMCAT_HOME': '/usr/lib/bigtop-tomcat', 'CDH_FLUME_HOME': '/usr/lib/flume-ng'}
[25/Apr/2018 17:25:44 +0000] 3508 MainThread agent        INFO     To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[25/Apr/2018 17:25:44 +0000] 3508 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/process
[25/Apr/2018 17:25:44 +0000] 3508 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor
[25/Apr/2018 17:25:44 +0000] 3508 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/flood
[25/Apr/2018 17:25:44 +0000] 3508 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor/include
[25/Apr/2018 17:25:44 +0000] 3508 MainThread agent        ERROR    Failed to connect to previous supervisor.
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2084, in find_or_start_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:25:44 +0000] 3508 MainThread tmpfs        INFO     Reusing mounted tmpfs at /run/cloudera-scm-agent/process
[25/Apr/2018 17:25:45 +0000] 3508 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 1)
[25/Apr/2018 17:25:45 +0000] 3508 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:25:46 +0000] 3508 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 2)
[25/Apr/2018 17:25:46 +0000] 3508 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:25:47 +0000] 3508 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 3)
[25/Apr/2018 17:25:47 +0000] 3508 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:25:48 +0000] 3508 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 4)
[25/Apr/2018 17:25:48 +0000] 3508 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:25:49 +0000] 3508 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 5)
[25/Apr/2018 17:25:49 +0000] 3508 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:25:49 +0000] 3508 MainThread agent        ERROR    Failed to connect to newly launched supervisor. Agent will exit
[25/Apr/2018 17:25:49 +0000] 3508 MainThread agent        INFO     Stopping agent...
[25/Apr/2018 17:25:49 +0000] 3508 MainThread agent        INFO     No extant cgroups; unmounting any cgroup roots
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     SCM Agent Version: 5.8.2
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Agent Protocol Version: 4
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Using Host ID: 842ca2b2-6e0a-422b-be27-1f8270defb60
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Using directory: /run/cloudera-scm-agent
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Using supervisor binary path: /usr/lib/cmf/agent/build/env/bin/supervisord
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Neither verify_cert_file nor verify_cert_dir are configured. Not performing validation of server certificates in HTTPS communication. These options can be configured in this agent's config.ini file to enable certificate validation.
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Agent Logging Level: INFO
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     No command line vars
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Found database jar: /usr/share/java/mysql-connector-java.jar
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Missing database jar: /usr/share/java/oracle-connector-java.jar (normal, if you're not using this database type)
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Found database jar: /usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
[25/Apr/2018 17:26:17 +0000] 4015 MainThread agent        INFO     Agent starting as pid 4015 user root(0) group root(0).
[25/Apr/2018 17:26:19 +0000] 4015 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/cgroups
[25/Apr/2018 17:26:19 +0000] 4015 MainThread cgroups      INFO     Found cgroups subsystem: cpu
[25/Apr/2018 17:26:19 +0000] 4015 MainThread cgroups      INFO     cgroup pseudofile /tmp/tmpodxfKH/cpu.rt_runtime_us does not exist, skipping
[25/Apr/2018 17:26:19 +0000] 4015 MainThread cgroups      INFO     Found cgroups subsystem: cpuacct
[25/Apr/2018 17:26:20 +0000] 4015 MainThread cgroups      INFO     Found cgroups subsystem: memory
[25/Apr/2018 17:26:20 +0000] 4015 MainThread cgroups      INFO     Found cgroups subsystem: blkio
[25/Apr/2018 17:26:20 +0000] 4015 MainThread cgroups      INFO     Reusing /run/cloudera-scm-agent/cgroups/memory
[25/Apr/2018 17:26:20 +0000] 4015 MainThread cgroups      INFO     Reusing /run/cloudera-scm-agent/cgroups/cpu
[25/Apr/2018 17:26:20 +0000] 4015 MainThread cgroups      INFO     Reusing /run/cloudera-scm-agent/cgroups/cpuacct
[25/Apr/2018 17:26:20 +0000] 4015 MainThread cgroups      INFO     Reusing /run/cloudera-scm-agent/cgroups/blkio
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Found cgroups capabilities: {'has_memory': True, 'default_memory_limit_in_bytes': -1, 'default_memory_soft_limit_in_bytes': -1, 'writable_cgroup_dot_procs': True, 'default_cpu_rt_runtime_us': -1, 'has_cpu': True, 'default_blkio_weight': 1000, 'default_cpu_shares': 1024, 'has_cpuacct': True, 'has_blkio': True}
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Setting up supervisord event monitor.
[25/Apr/2018 17:26:20 +0000] 4015 MainThread filesystem_map INFO     Monitored nodev filesystem types: ['nfs', 'nfs4', 'tmpfs']
[25/Apr/2018 17:26:20 +0000] 4015 MainThread filesystem_map INFO     Using timeout of 2.000000
[25/Apr/2018 17:26:20 +0000] 4015 MainThread filesystem_map INFO     Using join timeout of 0.100000
[25/Apr/2018 17:26:20 +0000] 4015 MainThread filesystem_map INFO     Using tolerance of 60.000000
[25/Apr/2018 17:26:20 +0000] 4015 MainThread filesystem_map INFO     Local filesystem types whitelist: ['ext2', 'ext3', 'ext4']
[25/Apr/2018 17:26:20 +0000] 4015 MainThread kt_renewer   INFO     Agent wide credential cache set to /run/cloudera-scm-agent/krb5cc_cm_agent_0
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Using metrics_url_timeout_seconds of 30.000000
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Using task_metrics_timeout_seconds of 5.000000
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Using max_collection_wait_seconds of 10.000000
[25/Apr/2018 17:26:20 +0000] 4015 MainThread metrics      INFO     Importing tasktracker metric schema from file /usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/monitor/tasktracker/schema.json
[25/Apr/2018 17:26:20 +0000] 4015 MainThread ntp_monitor  INFO     Using timeout of 2.000000
[25/Apr/2018 17:26:20 +0000] 4015 MainThread dns_names    INFO     Using timeout of 30.000000
[25/Apr/2018 17:26:20 +0000] 4015 MainThread __init__     INFO     Created DNS monitor.
[25/Apr/2018 17:26:20 +0000] 4015 MainThread stacks_collection_manager INFO     Using max_uncompressed_file_size_bytes: 5242880
[25/Apr/2018 17:26:20 +0000] 4015 MainThread __init__     INFO     Importing metric schema from file /usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/monitor/schema.json
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Supervised processes will add the following to their environment (in addition to the supervisor's env): {'CDH_PARQUET_HOME': '/usr/lib/parquet', 'JSVC_HOME': '/usr/libexec/bigtop-utils', 'CMF_PACKAGE_DIR': '/usr/lib/cmf/service', 'CDH_HADOOP_BIN': '/usr/bin/hadoop', 'MGMT_HOME': '/usr/share/cmf', 'CDH_IMPALA_HOME': '/usr/lib/impala', 'CDH_YARN_HOME': '/usr/lib/hadoop-yarn', 'CDH_HDFS_HOME': '/usr/lib/hadoop-hdfs', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games', 'CDH_HUE_PLUGINS_HOME': '/usr/lib/hadoop', 'CM_STATUS_CODES': u'STATUS_NONE HDFS_DFS_DIR_NOT_EMPTY HBASE_TABLE_DISABLED HBASE_TABLE_ENABLED JOBTRACKER_IN_STANDBY_MODE YARN_RM_IN_STANDBY_MODE', 'KEYTRUSTEE_KP_HOME': '/usr/share/keytrustee-keyprovider', 'CLOUDERA_ORACLE_CONNECTOR_JAR': '/usr/share/java/oracle-connector-java.jar', 'CDH_SQOOP2_HOME': '/usr/lib/sqoop2', 'KEYTRUSTEE_SERVER_HOME': '/usr/lib/keytrustee-server', 'CDH_MR2_HOME': '/usr/lib/hadoop-mapreduce', 'HIVE_DEFAULT_XML': '/etc/hive/conf.dist/hive-default.xml', 'CLOUDERA_POSTGRESQL_JDBC_JAR': '/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar', 'CDH_KMS_HOME': '/usr/lib/hadoop-kms', 'CDH_HBASE_HOME': '/usr/lib/hbase', 'CDH_SQOOP_HOME': '/usr/lib/sqoop', 'WEBHCAT_DEFAULT_XML': '/etc/hive-webhcat/conf.dist/webhcat-default.xml', 'CDH_OOZIE_HOME': '/usr/lib/oozie', 'CDH_ZOOKEEPER_HOME': '/usr/lib/zookeeper', 'CDH_HUE_HOME': '/usr/lib/hue', 'CLOUDERA_MYSQL_CONNECTOR_JAR': '/usr/share/java/mysql-connector-java.jar', 'CDH_HBASE_INDEXER_HOME': '/usr/lib/hbase-solr', 'CDH_MR1_HOME': '/usr/lib/hadoop-0.20-mapreduce', 'CDH_SOLR_HOME': '/usr/lib/solr', 'CDH_PIG_HOME': '/usr/lib/pig', 'CDH_SENTRY_HOME': '/usr/lib/sentry', 'CDH_CRUNCH_HOME': '/usr/lib/crunch', 'CDH_LLAMA_HOME': '/usr/lib/llama/', 'CDH_HTTPFS_HOME': '/usr/lib/hadoop-httpfs', 'CDH_HADOOP_HOME': '/usr/lib/hadoop', 'CDH_HIVE_HOME': '/usr/lib/hive', 'CDH_HCAT_HOME': '/usr/lib/hive-hcatalog', 'CDH_KAFKA_HOME': '/usr/lib/kafka', 'CDH_SPARK_HOME': '/usr/lib/spark', 'TOMCAT_HOME': '/usr/lib/bigtop-tomcat', 'CDH_FLUME_HOME': '/usr/lib/flume-ng'}
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/process
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/flood
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        INFO     Re-using pre-existing directory: /run/cloudera-scm-agent/supervisor/include
[25/Apr/2018 17:26:20 +0000] 4015 MainThread agent        ERROR    Failed to connect to previous supervisor.
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2084, in find_or_start_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:26:20 +0000] 4015 MainThread tmpfs        INFO     Reusing mounted tmpfs at /run/cloudera-scm-agent/process
[25/Apr/2018 17:26:21 +0000] 4015 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 1)
[25/Apr/2018 17:26:21 +0000] 4015 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:26:22 +0000] 4015 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 2)
[25/Apr/2018 17:26:22 +0000] 4015 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:26:23 +0000] 4015 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 3)
[25/Apr/2018 17:26:23 +0000] 4015 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:26:24 +0000] 4015 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 4)
[25/Apr/2018 17:26:24 +0000] 4015 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:26:25 +0000] 4015 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 5)
[25/Apr/2018 17:26:25 +0000] 4015 MainThread agent        ERROR    Failed! trying again in 1 second(s)
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2208, in connect_to_new_supervisor
    self.get_supervisor_process_info()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.2-py2.7.egg/cmf/agent.py", line 2230, in get_supervisor_process_info
    self.identifier = self.supervisor_client.supervisor.getIdentification()
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/xmlrpc.py", line 460, in request
    self.connection.request('POST', handler, request_body, self.headers)
  File "/usr/lib/python2.7/httplib.py", line 1017, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1051, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[25/Apr/2018 17:26:25 +0000] 4015 MainThread agent        ERROR    Failed to connect to newly launched supervisor. Agent will exit
[25/Apr/2018 17:26:25 +0000] 4015 MainThread agent        INFO     Stopping agent...
[25/Apr/2018 17:26:25 +0000] 4015 MainThread agent        INFO     No extant cgroups; unmounting any cgroup roots

 

I also found this output in /var/log/cloudera-scm-agent/supervisord.out. Apparently the supervisord is throwing an import error because of the module diewithparent.

Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/bin/supervisord", line 12, in <module>
    load_entry_point('supervisor==3.0', 'console_scripts', 'supervisord')()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 558, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2682, in load_entry_point
    return ep.load()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2355, in load
    return self.resolve()
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2361, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/supervisord.py", line 41, in <module>
    from supervisor.options import ServerOptions
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/supervisor-3.0-py2.7.egg/supervisor/options.py", line 24, in <module>
    import diewithparent
ImportError: No module named diewithparent
Highlighted
Expert Contributor
Posts: 125
Registered: ‎07-17-2017

Re: Host monitor not running; unable to restart cloudera-scm-agent due to supervisor connection refu

Hi @brandonvin

Try to run

service cloudera-scm-agent hard_stop_confirmed
service cloudera-scm-agent start

Or

service cloudera-scm-agent hard_restart

Good luck.

Announcements