Member since
11-29-2013
13
Posts
0
Kudos Received
0
Solutions
11-20-2014
03:49 PM
Finally, the issue has been solved! Indeed, Darren, the problem was in the hostname resolution. We looked at the database HOSTS table and instead of fqdn one entry was "localhost"; another node pointed to 127.0.0.1! Changing the values solved the issue. Be aware that agent restart updates the values again, thus /etc/hosts must be correctly set. The majority of service cofnigurations were corrupted, having "localhost" instead of a remote fqdn. I don't know how this could have happened out of the blue. Thanks, Darren 🙂 Case closed!, Gin.
... View more
11-20-2014
03:04 AM
Another problem: when trying to download a full log file via CLoudera Manager I get: HTTP ERROR 502 Problem accessing /cmf/process/all/logs/download. Reason: Connection refused
Could not connect to host. I thought to debug the jetty, but it overrides logging settings: "-Dlog4j.configuration=file:/etc/cloudera-scm-server/log4j.properties -Dcmf.root.logger=INFO,LOGFILE" and I don't know where the startup settings are kept.
... View more
11-20-2014
02:16 AM
Ok, so what I have done, I reinstalled Cloudera manager roles. Now the monitor works, but gives health issues: "WARNING hostname ip-10-0-1-171.compute.internal differs from the canonical name localhost". /etc/hosts contents: 127.0.0.1 localhost 10.0.1.171 ip-10-0-1-171.compute.internal Is there a probem with my hosts file?
... View more
11-20-2014
12:52 AM
"It sounds from the log like ServiceMonitor died. Can you answer my previous question about your management roles and whether each of them are running?" -All roles are running. "If not, what happens when you restart them?" -Restart doesn't change anything. All roles report one of the two errors: "connection refused" and "error while getting descriptor" (from web:7180). The code snippet from the previous post includes error messages from: All is ok in: /var/log/cloudera-scm-firehose/mgmt-cmf-mgmt-SERVICEMONITOR-localhost.log.out An error in: /var/log/cloudera-scm-firehose/mgmt-cmf-mgmt-HOSTMONITOR-localhost.log.out 2014-11-20 07:01:21,528 WARN com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON.
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy19.writeStatusRecords(Unknown Source)
at com.cloudera.cmon.firehose.BasicFirehoseClient.writeStatusRecords(BasicFirehoseClient.java:74)
at com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher.processRecords(HMONToSMONHostSubjectRecordPublisher.java:106)
at com.cloudera.cmon.tstore.leveldb.LDBSubjectRecordStore.write(LDBSubjectRecordStore.java:400)
at com.cloudera.cmon.kaiser.HMONTestRunner.runHostTestsForSession(HMONTestRunner.java:83)
at com.cloudera.cmon.kaiser.HMONTestRunner.runTestsForSession(HMONTestRunner.java:65)
at com.cloudera.cmon.kaiser.BaseTestRunner.runTestsOnAllSubjects(BaseTestRunner.java:148)
at com.cloudera.cmon.kaiser.KaiserService$KaiserServiceRunner.run(KaiserService.java:179)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)
... 9 more
Caused by: java.net.ConnectException: Connection refused EventServer problem: /var/log/cloudera-scm-eventserver/mgmt-cmf-mgmt-EVENTSERVER-localhost.log.out 2014-11-20 08:33:15,531 ERROR com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher: Could not publish metrics to HMON:
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy19.writeMetrics(Unknown Source)
at com.cloudera.cmon.firehose.BasicFirehoseClient.writeMetrics(BasicFirehoseClient.java:86)
at com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher.publishToHMON(EventMetricsPublisher.java:173)
at com.cloudera.cmf.eventcatcher.server.EventMetricsPublisher.run(EventMetricsPublisher.java:103)
at com.cloudera.enterprise.PeriodicEnterpriseService$UnexceptionablePeriodicRunnable.run(PeriodicEnterpriseService.java:67)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)
... 6 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method) What are these SMON and HMON? Are these databases or service some names? My db.properties has these databases: scm, amon, rman, nav (NO smon or hmon). I an connect to all databases manually using psql. "Is this a the right CM URL when inside your cluster?" -yep, I can access the web manager. The biggest issue is that error messages and exceptions are not helpful at all. They do not provide any debug information/traces, but merely say that an "error has occurred".
... View more
11-19-2014
02:04 PM
Hi Darren, thanks for the reply. Well, two main errors: "connection refused" and "error while getting descriptow" (from web:7180): ####################AGENT:
[19/Nov/2014 17:35:18 +0000] 3160 MonitorDaemon-Reporter throttling_logger ERROR (9 skipped) Error sending messages to firehose: mgmt-HOSTMONITOR-cd23de091d9f93f400336360b549bb6a
Traceback (most recent call last):
File "/usr/lib/cmf/agent/src/cmf/monitor/firehose.py", line 71, in _send
self._port)
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 464, in __init__
self.conn.connect()
File "/usr/lib/python2.7/httplib.py", line 757, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
####################EVENTSERVER:
2014-11-19 16:03:24,265 WARN com.cloudera.cmf.BasicScmProxy: IOException while getting descriptor
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:188)
at com.cloudera.cmf.BasicScmProxy.authenticateAndFetchScmDescriptor(BasicScmProxy.java:301)
at com.cloudera.cmf.BasicScmProxy.getScmDescriptor(BasicScmProxy.java:346)
at com.cloudera.cmf.BasicScmProxy.getScmDescriptor(BasicScmProxy.java:326)
at com.cloudera.cmf.eventcatcher.server.EventCatcherService.main(EventCatcherService.java:100)
2014-11-19 16:03:24,286 WARN com.cloudera.cmf.eventcatcher.server.EventCatcherService: No descriptor fetched from http://ip-10-0-1-1.eu-west-1.compute.internal:7180 on after 1 tries, sleeping...
2014-11-19 16:03:24,421 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={STACKTRACE=[java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at com.cloudera.cmf.BasicScmProxy.authenticate(BasicScmProxy.java:188)
at com.cloudera.cmf.BasicScmProxy.authenticateAndFetchScmDescriptor(BasicScmProxy.java:301)
at com.cloudera.cmf.BasicScmProxy.getScmDescriptor(BasicScmProxy.java:346)
at com.cloudera.cmf.BasicScmProxy.getScmDescriptor(BasicScmProxy.java:326)
at com.cloudera.cmf.eventcatcher.server.EventCatcherService.main(EventCatcherService.java:100)
####################FIREHOSE:
2014-11-19 16:03:32,084 WARN com.cloudera.cmon.firehose.Main: No descriptor fetched from http://ip-10-0-1-1.eu-west-1.compute.internal:7180 on after 5 tries, sleeping...
2014-11-19 16:03:34,085 ERROR com.cloudera.cmon.firehose.Main: Could not fetch descriptor after 5 tries, exiting.
####################Postgres:
LOG: unexpected EOF on client connection Another interesting point - logs. In CMS they have localhost keyword, e.g.: /var/log/cloudera-scm-eventserver/mgmt-cmf-mgmt-EVENTSERVER-localhost.log.out But on the filesystem for some reason they use the actul IP ..."EVENTSERVER-ip-10-0-1-1.eu-west-1.compute.internal.log.out". Don't know if this is how it should be, but it worked just fine two days ago. In worst case, is there a risk of losing data if I reinstall cloudera manager and add existing cluster services to it?
... View more
11-19-2014
08:40 AM
Hello,
today CDH5.1.3 has suddenly stopped working. Health monitoring no longer works, but I can acces Cloudera Manager (web). Well, first things first, I have decided to take a look at cloudera-scm-server.log and here is the output:
2014-11-19 16:20:24,106 INFO [1310736637@agentServer-0:components.StalenessChecker@69] No staleness check scheduled, scheduling one in 30 seconds
2014-11-19 16:20:32,103 ERROR [WebServerImpl:cmf.TsqueryAutoCompleter@391] Error getting predicates
org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)
at com.sun.proxy.$Proxy94.getImpalaFilterMetadata(Unknown Source)
at com.cloudera.cmf.protocol.firehose.nozzle.TimeoutNozzleIPC.getImpalaFilterMetadata(TimeoutNozzleIPC.java:377)
at com.cloudera.server.web.cmf.impala.components.ImpalaDao.fetchFilterMetadata(ImpalaDao.java:688)
at com.cloudera.server.web.cmf.work.AbstractWorkDao.getAndUpdateAutoCompleter(AbstractWorkDao.java:117)
at com.cloudera.server.web.cmf.TsqueryAutoCompleter.<init>(TsqueryAutoCompleter.java:181)
at com.cloudera.server.web.cmf.charts.TimeSeriesQueryController.initialize(TimeSeriesQueryController.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:340)
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:293)
at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:130)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyBeanPostProcessorsBeforeInitialization(AbstractAutowireCapableBeanFactory.java:394)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1413)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:293)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:290)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:192)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:585)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:895)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:425)
at org.springframework.web.servlet.FrameworkServlet.createWebApplicationContext(FrameworkServlet.java:467)
at org.springframework.web.servlet.FrameworkServlet.createWebApplicationContext(FrameworkServlet.java:483)
at org.springframework.web.servlet.FrameworkServlet.initWebApplicationContext(FrameworkServlet.java:358)
at org.springframework.web.servlet.FrameworkServlet.initServletBean(FrameworkServlet.java:325)
at org.springframework.web.servlet.HttpServletBean.init(HttpServletBean.java:127)
at javax.servlet.GenericServlet.init(GenericServlet.java:241)
at org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:440)
at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:263)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:736)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at com.cloudera.server.cmf.WebServerImpl.run(WebServerImpl.java:277)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
at org.apache.avro.ipc.HttpTransceiver.writeBuffers(HttpTransceiver.java:71)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:58)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:72)
at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:72)
... 40 more
2014-11-19 16:20:36,014 INFO [JvmPauseMonitor:debug.JvmPauseMonitor@236] Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 1146ms: GC pool 'PS MarkSweep' had collection(s): count=1 time=1418ms, GC pool 'PS Scavenge' had collection(s): count=1 time=209ms
2014-11-19 16:20:36,016 INFO [JvmPauseMonitor:debug.JvmPauseMonitor@236] Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 1545ms: GC pool 'PS MarkSweep' had collection(s): count=1 time=1418ms, GC pool 'PS Scavenge' had collection(s): count=1 time=209ms
So my question is - do you have any clue what might have gone wrong and where do I start? Is there a verboseness/debug option? Initially, (heap/ non java) memory settings were ~30% of the recommended ones. Now I set them to 100%. The issue persists. Your help is much appreciated. Gin
... View more
Labels:
- Labels:
-
Cloudera Manager