Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cloudera Management Services fails to start after upgrade

Cloudera Management Services fails to start after upgrade

Explorer

Morning everyone, 

 

After a parcel based upgrade using the upgrade wizard from cdh-5.9.0-1.cdh5.9.0.p0.23 to cdh-5.9.1-1.cdh5.9.1.p0.4, the Cloudera management Service fails to start (all four roles - Service Monitor, Host Monitor, Event Server and Alert Publisher showing "connection refused" errors in the logs). Not sure why all of a sudden we're getting "Connection refused", SSH between all hosts is checked and fine, and there's no firewall between the hosts.

 

Environment -

  1. Ten node virtual cluster on Ubuntu 12.04
  2. Passwordless SSH set up on all machines, checked and OK
  3. Management service on a dedicated node
  4. Three service nodes (zookeeper, Name and secondary Name nodes, Journal nodes, Hive and Hue services, Oozie service, etc)
  5. Database schemas for Management Service, Hue Hive and Oozie on a MySQL 5.5 database on a remote server
  6. Six worker nodes (data nodes and node managers)

Cluster was working fine until upgrade.

 

Log tails attached below, information is a little sparse? Anywhere else we should be looking?

 

Any assistance greatly appreciated.

 

Cheers

 

Geoff

 

Service Monitor Log

Failed to publish event: SimpleEvent{attributes={STACKTRACE=[java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:265)
at com.cloudera.cmf.BasicScmProxy.fetch(BasicScmProxy.java:561)
at com.cloudera.cmf.BasicScmProxy.getFragmentAndHash(BasicScmProxy.java:651)
at com.cloudera.cmf.DescriptorAndFragments.newDescriptorAndFragments(DescriptorAndFragments.java:64)
at com.cloudera.cmon.firehose.Main.main(Main.java:376)
], EXCEPTION_TYPES=[java.net.ConnectException], ROLE=[mgmt-SERVICEMONITOR-8eae562bb2fa67b8957e2284d4164e4c], SEVERITY=[IMPORTANT], SERVICE=[mgmt], HOST_IDS=[f4e2779c-185d-40f4-8906-ce4f86d426f4], LOG_LEVEL=[WARN], ROLE_TYPE=[SERVICEMONITOR], CATEGORY=[LOG_MESSAGE], SERVICE_TYPE=[MGMT], HOSTS=[athena.rc.pawsey.org.au], EVENTCODE=[EV_LOG_EVENT]}, content=Exception while getting fetch configDefaults hash: none, timestamp=1487045012961}
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 2 tries, sleeping...
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 3 tries, sleeping...
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 4 tries, sleeping...
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 5 tries, sleeping...
Could not fetch descriptor after 5 tries, exiting.

 

Host Monitor Log

Failed to publish event: SimpleEvent{attributes={STACKTRACE=[java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:265)
at com.cloudera.cmf.BasicScmProxy.fetch(BasicScmProxy.java:561)
at com.cloudera.cmf.BasicScmProxy.getFragmentAndHash(BasicScmProxy.java:651)
at com.cloudera.cmf.DescriptorAndFragments.newDescriptorAndFragments(DescriptorAndFragments.java:64)
at com.cloudera.cmon.firehose.Main.main(Main.java:376)
], EXCEPTION_TYPES=[java.net.ConnectException], ROLE=[mgmt-HOSTMONITOR-8eae562bb2fa67b8957e2284d4164e4c], SEVERITY=[IMPORTANT], SERVICE=[mgmt], HOST_IDS=[f4e2779c-185d-40f4-8906-ce4f86d426f4], LOG_LEVEL=[WARN], ROLE_TYPE=[HOSTMONITOR], CATEGORY=[LOG_MESSAGE], SERVICE_TYPE=[MGMT], HOSTS=[athena.rc.pawsey.org.au], EVENTCODE=[EV_LOG_EVENT]}, content=Exception while getting fetch configDefaults hash: none, timestamp=1487045012944}
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 2 tries, sleeping...
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 3 tries, sleeping...
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 4 tries, sleeping...
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 5 tries, sleeping...
Could not fetch descriptor after 5 tries, exiting.

 

Event Server Log

Failed to publish event: SimpleEvent{attributes={STACKTRACE=[java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:265)
at com.cloudera.cmf.BasicScmProxy.fetch(BasicScmProxy.java:561)
at com.cloudera.cmf.BasicScmProxy.getFragmentAndHash(BasicScmProxy.java:651)
at com.cloudera.cmf.DescriptorAndFragments.newDescriptorAndFragments(DescriptorAndFragments.java:64)
at com.cloudera.cmf.eventcatcher.server.EventCatcherService.main(EventCatcherService.java:100)
], EXCEPTION_TYPES=[java.net.ConnectException], ROLE=[mgmt-EVENTSERVER-8eae562bb2fa67b8957e2284d4164e4c], SEVERITY=[IMPORTANT], SERVICE=[mgmt], HOST_IDS=[f4e2779c-185d-40f4-8906-ce4f86d426f4], LOG_LEVEL=[WARN], ROLE_TYPE=[EVENTSERVER], CATEGORY=[LOG_MESSAGE], SERVICE_TYPE=[MGMT], HOSTS=[athena.rc.pawsey.org.au], EVENTCODE=[EV_LOG_EVENT]}, content=Exception while getting fetch configDefaults hash: none, timestamp=1487045014712}
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 2 tries, sleeping...
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 3 tries, sleeping...
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 4 tries, sleeping...
No descriptor fetched from http://athena.rc.pawsey.org.au:7180 on after 5 tries, sleeping...
Could not fetch descriptor after 5 tries, exiting.

 

Alert Publisher Log

Consumer Consumer[event://athena.rc.pawsey.org.au:7184?eventStoreHttpPort=7185&eventsQueryTimeoutMillis=60000] could not poll endpoint: event://athena.rc.pawsey.org.au:7184?eventStoreHttpPort=7185&eventsQueryTimeoutMillis=60000 caused by: java.net.ConnectException: Connection refused
org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused
	at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)
	at com.sun.proxy.$Proxy9.queryEvents(Unknown Source)
	at com.cloudera.cmf.event.query.AvroEventStoreQueryProxy.doQuery(AvroEventStoreQueryProxy.java:160)
	at com.cloudera.enterprise.alertpublisher.component.EventStoreConsumer.poll(EventStoreConsumer.java:167)
	at org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:97)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:579)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
	at sun.net.www.http.HttpClient.New(HttpClient.java:308)
	at sun.net.www.http.HttpClient.New(HttpClient.java:326)
	at sun.net.www.protocol.ht

 

1 REPLY 1

Re: Cloudera Management Services fails to start after upgrade

New Contributor

Hi,

 

I have the same issue. Were you able to resolve this?