Reply
Highlighted
New Contributor
Posts: 5
Registered: ‎08-16-2017
Accepted Solution

Failed to connect to previous supervisor after cluster upgrade to 5.12

Hi, All 

 

I was upgrading my dev cluser containing 5 hosts from 5.11.0 to 5.12.0

I have 4 nodes with hdfs/yarn modiles running and one host which is in cluser but it is running the cloudera agent only.

4 hdfs/yarn hosts were able to upgrade to 5.12 flawlessly.

However I had to upgrade the host with agent only manually.

What I did was essentially the following:

1. Stopped the 5.11 agent service

2. Downloaded the 5.12 agent and daemon RPMs 

3. Removed the old agent 

4. Installed new RPMs 

5. Corrected the config.ini to point onto the proper server_host

6. Started the agent 

 

 

And if friggin' failed with the following error message :

[15/Aug/2017 18:57:37 +0000] 51524 MainThread kt_renewer   INFO     Agent wide credential cache set to /var/run/cloudera-scm-agent/krb5cc_cm_agent_0
[15/Aug/2017 18:57:37 +0000] 51524 MainThread agent        INFO     Using metrics_url_timeout_seconds of 30.000000
[15/Aug/2017 18:57:37 +0000] 51524 MainThread agent        INFO     Using task_metrics_timeout_seconds of 5.000000
[15/Aug/2017 18:57:37 +0000] 51524 MainThread agent        INFO     Using max_collection_wait_seconds of 10.000000
[15/Aug/2017 18:57:37 +0000] 51524 MainThread metrics      INFO     Importing tasktracker metric schema from file /usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.12.0-py2.6.egg/cmf/monitor/tasktracker/schema.json
[15/Aug/2017 18:57:38 +0000] 51524 MainThread tcp_metrics  WARNING  File '/proc/net/tcp6' couldn't be opened for tcp statistic collection, error=2
[15/Aug/2017 18:57:38 +0000] 51524 MainThread ntp_monitor  INFO     Using timeout of 2.000000
[15/Aug/2017 18:57:38 +0000] 51524 MainThread dns_names    INFO     Using timeout of 30.000000
[15/Aug/2017 18:57:38 +0000] 51524 MainThread __init__     INFO     Created DNS monitor.
[15/Aug/2017 18:57:38 +0000] 51524 MainThread stacks_collection_manager INFO     Using max_uncompressed_file_size_bytes: 5242880
[15/Aug/2017 18:57:38 +0000] 51524 MainThread __init__     INFO     Importing metric schema from file /usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.12.0-py2.6.egg/cmf/monitor/schema.json
[15/Aug/2017 18:57:39 +0000] 51524 MainThread agent        INFO     Supervised processes will add the following to their environment (in addition to the supervisor's env): {'CDH_PARQUET_HOME': '/usr/lib/parquet', 'JSVC_HOME': '/usr/libexec/bigtop-utils', 'CMF_PACKAGE_DIR': '/usr/lib64/cmf/service', 'CDH_HADOOP_BIN': '/usr/bin/hadoop', 'MGMT_HOME': '/usr/share/cmf', 'CDH_IMPALA_HOME': '/usr/lib/impala', 'CDH_YARN_HOME': '/usr/lib/hadoop-yarn', 'CDH_HDFS_HOME': '/usr/lib/hadoop-hdfs', 'PATH': '/sbin:/usr/sbin:/bin:/usr/bin', 'CDH_HUE_PLUGINS_HOME': '/usr/lib/hadoop', 'CM_STATUS_CODES': u'STATUS_NONE HDFS_DFS_DIR_NOT_EMPTY HBASE_TABLE_DISABLED HBASE_TABLE_ENABLED JOBTRACKER_IN_STANDBY_MODE YARN_RM_IN_STANDBY_MODE', 'KEYTRUSTEE_KP_HOME': '/usr/share/keytrustee-keyprovider', 'CLOUDERA_ORACLE_CONNECTOR_JAR': '/usr/share/java/oracle-connector-java.jar', 'CDH_SQOOP2_HOME': '/usr/lib/sqoop2', 'KEYTRUSTEE_SERVER_HOME': '/usr/lib/keytrustee-server', 'CDH_MR2_HOME': '/usr/lib/hadoop-mapreduce', 'HIVE_DEFAULT_XML': '/etc/hive/conf.dist/hive-default.xml', 'CLOUDERA_POSTGRESQL_JDBC_JAR': '/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar', 'CDH_KMS_HOME': '/usr/lib/hadoop-kms', 'CDH_HBASE_HOME': '/usr/lib/hbase', 'CDH_SQOOP_HOME': '/usr/lib/sqoop', 'WEBHCAT_DEFAULT_XML': '/etc/hive-webhcat/conf.dist/webhcat-default.xml', 'CDH_OOZIE_HOME': '/usr/lib/oozie', 'CDH_ZOOKEEPER_HOME': '/usr/lib/zookeeper', 'CDH_HUE_HOME': '/usr/lib/hue', 'CLOUDERA_MYSQL_CONNECTOR_JAR': '/usr/share/java/mysql-connector-java.jar', 'CDH_HBASE_INDEXER_HOME': '/usr/lib/hbase-solr', 'CDH_MR1_HOME': '/usr/lib/hadoop-0.20-mapreduce', 'CDH_SOLR_HOME': '/usr/lib/solr', 'CDH_PIG_HOME': '/usr/lib/pig', 'CDH_SENTRY_HOME': '/usr/lib/sentry', 'CDH_CRUNCH_HOME': '/usr/lib/crunch', 'CDH_LLAMA_HOME': '/usr/lib/llama/', 'CDH_HTTPFS_HOME': '/usr/lib/hadoop-httpfs', 'CDH_HADOOP_HOME': '/usr/lib/hadoop', 'CDH_HIVE_HOME': '/usr/lib/hive', 'ORACLE_HOME': '/usr/share/oracle/instantclient', 'CDH_HCAT_HOME': '/usr/lib/hive-hcatalog', 'CDH_KAFKA_HOME': '/usr/lib/kafka', 'CDH_SPARK_HOME': '/usr/lib/spark', 'TOMCAT_HOME': '/usr/lib/bigtop-tomcat', 'CDH_FLUME_HOME': '/usr/lib/flume-ng'}
[15/Aug/2017 18:57:39 +0000] 51524 MainThread agent        INFO     To override these variables, use /etc/cloudera-scm-agent/config.ini. Environment variables for CDH locations are not used when CDH is installed from parcels.
[15/Aug/2017 18:57:39 +0000] 51524 MainThread agent        INFO     Re-using pre-existing directory: /var/run/cloudera-scm-agent/process
[15/Aug/2017 18:57:39 +0000] 51524 MainThread agent        INFO     Re-using pre-existing directory: /var/run/cloudera-scm-agent/supervisor
[15/Aug/2017 18:57:39 +0000] 51524 MainThread agent        INFO     Re-using pre-existing directory: /var/run/cloudera-scm-agent/flood
[15/Aug/2017 18:57:39 +0000] 51524 MainThread agent        INFO     Re-using pre-existing directory: /var/run/cloudera-scm-agent/supervisor/include
[15/Aug/2017 18:57:39 +0000] 51524 MainThread agent        INFO     Conf directory: /var/run/cloudera-scm-agent/supervisor
[15/Aug/2017 18:57:39 +0000] 51524 MainThread agent        ERROR    Failed to connect to previous supervisor.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.12.0-py2.6.egg/cmf/agent.py", line 2109, in find_or_start_supervisor
    self.configure_supervisor_clients()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/cmf-5.12.0-py2.6.egg/cmf/agent.py", line 2291, in configure_supervisor_clients
    supervisor_options.realize(args=["-c", os.path.join(self.supervisor_dir, "supervisord.conf")])
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 1599, in realize
    Options.realize(self, *arg, **kw)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 333, in realize
    self.process_config()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 341, in process_config
    self.process_config_file(do_usage)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 376, in process_config_file
    self.usage(str(msg))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 164, in usage
    self.exit(2)
SystemExit: 2
[15/Aug/2017 18:57:39 +0000] 51524 Dummy-1 daemonize    WARNING  Stopping daemon.
[15/Aug/2017 18:57:39 +0000] 51524 Dummy-1 agent        INFO     Stopping agent...
[15/Aug/2017 18:57:39 +0000] 51524 Dummy-1 agent        INFO     No extant cgroups; unmounting any cgroup roots

 

The error " Failed to connect to previous supervisor" alppeared, similar to the one in https://community.cloudera.com/t5/Cloudera-Manager-Installation/Failed-to-connect-to-previous-superv...

 

The question is -- how do I upgrade the node to the proper 5.12 version agent? 

 

 

 

P.S. Finally I was able to overcome the issue temporarily by downgrade the agent to 5.11.1. Please note the 5.11.1.  When I tried to downgrade to 5.11.0 which is original version the node disappeared from the cluster.

But I need the final solution here :)))

Cloudera Employee
Posts: 212
Registered: ‎07-08-2013

Re: Failed to connect to previous supervisor after cluster upgrade to 5.12

Hi dekan,

 

It looks like you're running into the same issue as the following fellow community members [0]

Can you try the workaround in [1] 

 

[0] https://community.cloudera.com/t5/Cloudera-Manager-Installation/Cloudera-Manager-Heartbeat-Python-Su...

[1] http://community.cloudera.com/t5/Cloudera-Manager-Installation/CDH-5-12-0-clouder-manager-agent-can-...

New Contributor
Posts: 5
Registered: ‎08-16-2017

Re: Failed to connect to previous supervisor after cluster upgrade to 5.12

[ Edited ]

Thank you, Michalis

This looks exactly like my case!

Unfortunately I can't use the workaround from [1] provided as there is some other stuff running in the host which uses tmpfs.

Do I understand correctly this issue is going to be fixed in later releases of 5.12 ?

 

Cloudera Employee
Posts: 212
Registered: ‎07-08-2013

Re: Failed to connect to previous supervisor after cluster upgrade to 5.12

Quote: "...this issue is going to be fixed in later releases of 5.12 ?"

Yes, the issue is already fixed and committed, we aim to have it available in the next 5.12.x maintenance release.

Contributor
Posts: 60
Registered: ‎06-03-2014

Re: Failed to connect to previous supervisor after cluster upgrade to 5.12

Michalis,

 

Do you know when the new 5.12.x maintenance release will be available? I have this problem and would like to see it fixed. However, I can't wait too long as I have nodes not connected to my CM. How long away are we from this release?

 

Kevin

Cloudera Employee
Posts: 212
Registered: ‎07-08-2013

Re: Failed to connect to previous supervisor after cluster upgrade to 5.12

Quote: "Do you know when the new 5.12.x maintenance release will be available?"

It's available now; [ANNOUNCE] Cloudera Enterprise 5.12.1 Released [0]

 

 

[0] http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Cloudera-Enterprise-5-12-1-Released...

Announcements