Created on 04-21-2015 01:41 PM - edited 09-16-2022 02:26 AM
Hi
I have installed CDH 5.3.3 successfully on ubuntu 14.04 . but when i reboot my ubuntu system , cloudera mangement service ( Event , Host , service monitor) fail to start.
show follwing errors:
Event Server :
12:37:40.957 PM WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry
Failed to publish event: SimpleEvent{attributes={STACKTRACE=[java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:188)
at com.cloudera.cmf.BasicScmProxy.authenticateAndFetchScmDescriptor(BasicScmProxy.java:301)
at com.cloudera.cmf.BasicScmProxy.getScmDescriptor(BasicScmProxy.java:346)
at com.cloudera.cmf.BasicScmProxy.getScmDescriptor(BasicScmProxy.java:326)
at com.cloudera.cmf.eventcatcher.server.EventCatcherService.main(EventCatcherService.java:100)
], EXCEPTION_TYPES=[java.net.ConnectException], ROLE=[mgmt-EVENTSERVER-e8c92ccecb4376455a55563353303d3f], SEVERITY=[IMPORTANT], SERVICE=[mgmt], HOST_IDS=[0001a5a1-846a-4022-b4c6-a204abd12813], LOG_LEVEL=[WARN], ROLE_TYPE=[EVENTSERVER], CATEGORY=[LOG_MESSAGE], SERVICE_TYPE=[MGMT], HOSTS=[master.novalocal], EVENTCODE=[EV_LOG_EVENT]}, content=IOException while getting descriptor, timestamp=1429645060761}
12:37:42.831 PM WARN com.cloudera.cmf.eventcatcher.server.EventCatcherService
No descriptor fetched from http://master.novalocal:7180 on after 2 tries, sleeping...
12:37:44.833 PM WARN com.cloudera.cmf.eventcatcher.server.EventCatcherService
No descriptor fetched from http://master.novalocal:7180 on after 3 tries, sleeping...
12:37:46.837 PM WARN com.cloudera.cmf.eventcatcher.server.EventCatcherService
No descriptor fetched from http://master.novalocal:7180 on after 4 tries, sleeping...
12:37:48.838 PM WARN com.cloudera.cmf.eventcatcher.server.EventCatcherService
No descriptor fetched from http://master.novalocal:7180 on after 5 tries, sleeping...
12:37:50.838 PM ERROR com.cloudera.cmf.eventcatcher.server.EventCatcherService
Could not fetch descriptor after 5 tries, exiting.
Event Server :
12:37:39.548 PM WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry
Failed to publish event: SimpleEvent{attributes={STACKTRACE=[java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:188)
at com.cloudera.cmf.BasicScmProxy.authenticateAndFetchScmDescriptor(BasicScmProxy.java:301)
at com.cloudera.cmf.BasicScmProxy.getScmDescriptor(BasicScmProxy.java:346)
at com.cloudera.cmon.firehose.Main.main(Main.java:374)
], EXCEPTION_TYPES=[java.net.ConnectException], ROLE=[mgmt-SERVICEMONITOR-e8c92ccecb4376455a55563353303d3f], SEVERITY=[IMPORTANT], SERVICE=[mgmt], HOST_IDS=[0001a5a1-846a-4022-b4c6-a204abd12813], LOG_LEVEL=[WARN], ROLE_TYPE=[SERVICEMONITOR], CATEGORY=[LOG_MESSAGE], SERVICE_TYPE=[MGMT], HOSTS=[master.novalocal], EVENTCODE=[EV_LOG_EVENT]}, content=IOException while getting descriptor, timestamp=1429645059382}
12:37:41.451 PM WARN com.cloudera.cmon.firehose.Main
No descriptor fetched from http://master.novalocal:7180 on after 2 tries, sleeping...
12:37:43.452 PM WARN com.cloudera.cmon.firehose.Main
No descriptor fetched from http://master.novalocal:7180 on after 3 tries, sleeping...
12:37:45.454 PM WARN com.cloudera.cmon.firehose.Main
No descriptor fetched from http://master.novalocal:7180 on after 4 tries, sleeping...
12:37:47.456 PM WARN com.cloudera.cmon.firehose.Main
No descriptor fetched from http://master.novalocal:7180 on after 5 tries, sleeping...
12:37:49.456 PM ERROR com.cloudera.cmon.firehose.Main
Could not fetch descriptor after 5 tries, exiting.
Host Monitor :
12:37:40.676 PM WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry
Failed to publish event: SimpleEvent{attributes={STACKTRACE=[java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:188)
at com.cloudera.cmf.BasicScmProxy.authenticateAndFetchScmDescriptor(BasicScmProxy.java:301)
at com.cloudera.cmf.BasicScmProxy.getScmDescriptor(BasicScmProxy.java:346)
at com.cloudera.cmon.firehose.Main.main(Main.java:374)
], EXCEPTION_TYPES=[java.net.ConnectException], ROLE=[mgmt-HOSTMONITOR-e8c92ccecb4376455a55563353303d3f], SEVERITY=[IMPORTANT], SERVICE=[mgmt], HOST_IDS=[0001a5a1-846a-4022-b4c6-a204abd12813], LOG_LEVEL=[WARN], ROLE_TYPE=[HOSTMONITOR], CATEGORY=[LOG_MESSAGE], SERVICE_TYPE=[MGMT], HOSTS=[master.novalocal], EVENTCODE=[EV_LOG_EVENT]}, content=IOException while getting descriptor, timestamp=1429645060526}
12:37:42.599 PM WARN com.cloudera.cmon.firehose.Main
No descriptor fetched from http://master.novalocal:7180 on after 2 tries, sleeping...
12:37:44.601 PM WARN com.cloudera.cmon.firehose.Main
No descriptor fetched from http://master.novalocal:7180 on after 3 tries, sleeping...
12:37:46.602 PM WARN com.cloudera.cmon.firehose.Main
No descriptor fetched from http://master.novalocal:7180 on after 4 tries, sleeping...
12:37:48.603 PM WARN com.cloudera.cmon.firehose.Main
No descriptor fetched from http://master.novalocal:7180 on after 5 tries, sleeping...
12:37:50.604 PM ERROR com.cloudera.cmon.firehose.Main
Could not fetch descriptor after 5 tries, exiting.
Your help is much appreciated.
Regards
Prateek
Created 06-22-2015 05:56 AM
Hi Pradeep,
Were you able to resolve this?
I'm facing the same problem.
Thanks,
shr1k
Created 06-22-2015 11:00 PM
Hello,
If you are getting the same errors as indicated in the original post that each management service cannot fetch the descriptor, that indicates a problem for the management services contacting Cloudera Manager. In order to know what hosts, services, and roles participate in the cluster (among other things), the management services must be able to retrieve the a descriptor for the cluster from CM upon startup. If it cannot be retreived, then the management service will fail to start all the way.
The original post shows that the root cause of this is that no connection can be made. If SSL is not involved, then typically there is a firewall or some other configuration issue that is preventing the management services from resolving/connecting. If you have exactly the same stack traces, I would try shutting off any firewalls on the hosts where the management services run and where Cloudere Manager runs. Try using telnet or ncat to ensure that you can make a connection from the management service host to the CM host.
If that doesn't help, you might post the exceptions you are seeing as there may be something different about the cause in your case.
-Ben
Created 06-05-2016 09:15 AM
Fresh simple default install of latest 5.7.x.p0.76. Cloudera Manager and HMS, HS2, HS, NM, SM and OS roles are on one node, data nodes and other roles are elsewhere.
Cluster shows good health, charts are updated.
Restart this node - results in no metrics and host monitor connection refused messages. Restarting Cloudera Management Service solves the problem.
Is there a way to be be able to restart the node without manually restarting Cloudera Management Service later?
Is there a race condition?
Created on 01-11-2017 12:55 AM - edited 01-11-2017 01:14 AM
Created on 01-11-2017 01:09 AM - edited 01-11-2017 02:25 AM
Hi,
I work with cdh5.4.7 and I have the same issue, and I resolved it.
When Cloudera Manager server is restarted after upgrade or maintance tasks this starts cloudera server and cloudera agent, but it does not start Cloudera Management Services (mgmt).
The reason is because cloudera-scm-server and cloudera-scm-agent is configured to start at the same time:
[ cloudera_server ]: grep chkconfig /etc/init.d/cloudera-scm-*
/etc/init.d/cloudera-scm-agent:# chkconfig: 2345 90 10
/etc/init.d/cloudera-scm-server:# chkconfig: 2345 90 10
Cloudera Agent start Cloudera Management Services (mgmt) and it needs to connect to Cloudera Server, Cloudera Server takes more time to start than Cloudera Agent. Cloudera Agent tries to start mgmt 5 times with only 2 seconds between every retry, finally mgmt cannot start (in mgmt role logs I can see "connection refused" errors):
2017-01-02 15:06:44,673 WARN com.cloudera.cmon.firehose.Main: No descriptor fetched from https://cloudera_server:7183 on after 1 tries, sleeping...
2017-01-02 15:06:44,798 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[SERVICEMONITOR], EXCEPTION_TYPES=[java.net.ConnectException], HOST_IDS=[..], STACKTRACE=[java.net.ConnectException: Connection refused
[..]
2017-01-02 15:06:46,708 WARN com.cloudera.cmon.firehose.Main: No descriptor fetched from https://cloudera_server:7183 on after 2 tries, sleeping...
[..]
2017-01-02 15:06:52,724 WARN com.cloudera.cmon.firehose.Main: No descriptor fetched from https://cloudera_server:7183 on after 5 tries, sleeping...
To temporally fix this issue I do that:
1. Change start order:
I changed to this (server 90 to 89):
[ cloudera_server ]: grep chkconfig /etc/init.d/cloudera-scm-*
/etc/init.d/cloudera-scm-agent:# chkconfig: 2345 90 10
/etc/init.d/cloudera-scm-server:# chkconfig: 2345 89 10
2. Add cloudera server check in agent init start script.
/etc/init.d/cloudera-scm-agent (green line):
---
[..]
start() {
[..]
+ for i in $(seq 1 30); do curl -k -s -I $(facter cdh_url | awk -F\/api '{print $1}') | grep -q '200 OK' &>/tmp/init_cloudera_agent.out && break; sleep 10; done
$CMF_SUDO_CMD /bin/bash -c "nohup $AGENT_SCRIPT $CMF_AGENT_ARGS" >> $AGENT_OUT 2>&1 </dev/null &
[..]
}
[..]
---
* cdh_url is a custom facter that retur https://cloudera_server:7183/api/v10
If I only change start order it doesn't works because when "/etc/init.d/cloudera-scm-server start" is executed it doesn't wait to be completely started, it returns OK immediately (but is starting yet in background). When I reboot this server it starts cloudera-scm-server and immediately starts cloudera-scm-agent, cloudera-scm-agent starts faster than cloudera-scm-server and mgmt cannot connect to cloudera server web, after 5 tries it still down and I need to start mgmt manually.
If I do this changes it works fine, but I think that I should not change this configurations…
Another valid solution is that cloudera-scm-server waits to be successful and completely started to return OK and start first server and then agent with mgmt services, but for the moment it works for me.
Marc.
Created 05-22-2017 08:59 PM
Hi,
I encountered the same issue, and resolved it.
cdh5.9.1
Cloudera Management Service > Configration > Search "Descriptor"
Set "Descriptor Fetch Max Tries" to a larger value - 60 (default: 5)
I left "Descriptor Fetch Tries Interval" as default - 2 seconds.
result (HOSTMONITOR log)
2017-05-23 12:09:56,804 WARN com.cloudera.cmon.firehose.Main: No descriptor fetched from http://cloudera-manager-server:7180 on after 27 tries, sleeping...
2017-05-23 12:09:58,805 WARN com.cloudera.cmon.firehose.Main: No descriptor fetched from http://cloudera-manager-server:7180 on after 28 tries, sleeping...
2017-05-23 12:10:00,806 WARN com.cloudera.cmon.firehose.Main: No descriptor fetched from http://cloudera-manager-server:7180 on after 29 tries, sleeping...
2017-05-23 12:10:04,029 INFO com.cloudera.cmf.BasicScmProxy: Using encrypted credentials for SCM
2017-05-23 12:10:04,182 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM.
2017-05-23 12:10:07,595 INFO com.cloudera.cmon.firehose.Main: SCM descriptor fragments fetched successfully
so do other mgmt services
Nob.
Created 06-02-2017 07:23 AM
Created on 06-22-2015 03:39 PM - edited 06-22-2015 03:39 PM
Hi, paste your /etc/hosts contents here. and service --status-all from comamnd prompt?
BR
Sam