Support Questions
Find answers, ask questions, and share your expertise

How to notify when one of the services or roles go down

How to notify when one of the services or roles go down

Champion Alumni

Hi,

 

Is there a way to  notify an email group when one the services  or roles  go down.Say one of the hbase region servers go down and I want to notify a support group.How can this setting be done in CM 5.I using cm 5 along with cdh 5.

 

Thanks,

Nishan

6 REPLIES 6
Highlighted

Re: How to notify when one of the services or roles go down

Hi Nishan,

 

You can refer below links for your alert configuration requirement.

 

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Adminis...

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Adminis...

 

Please revert back in case you are still facing issue in alert configuration.

Regards,
Chirag Patadia.
Highlighted

Re: How to notify when one of the services or roles go down

Expert Contributor
If you are using Cloudera manager, you can set up the alerts from CM interface, which sends you an email when a specific role/service goes off..
If you are not using Cloudera manager, you can use nagios to setup the alerts...
Nagios Documentation: http://nagios.sourceforge.net/docs/3_0/notifications.html
http://www.linux.com/learn/tutorials/316105:setting-up-email-alerts-for-network-monitoring-with-nagi...
Em Jay
Highlighted

Re: How to notify when one of the services or roles go down

Champion Alumni

Hi, Thank you guys.I set up my email server and on sending a test alert I am getting the below exception.On netstat I see that the port is being used by some process.Can some one help?

 

2014-05-30 16:30:48,215 WARN org.apache.camel.impl.DefaultPollingConsumerPollStrategy: Consumer Consumer[event://hostname:7184?eventStoreHttpPort=7185&eventsQueryTimeoutMillis=60000] could not poll endpoint: event://hostname:7184?eventStoreHttpPort=7185&eventsQueryTimeoutMillis=60000 caused by: java.net.ConnectException: Connection refused org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88) at com.sun.proxy.$Proxy9.queryEvents(Unknown Source) at com.cloudera.cmf.event.query.AvroEventStoreQueryProxy.doQuery(AvroEventStoreQueryProxy.java:160) at com.cloudera.enterprise.alertpublisher.component.EventStoreConsumer.poll(EventStoreConsumer.java:167) at org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:97) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091) at org.apache.avro.ipc.HttpTransceiver.writeBuffers(HttpTransceiver.java:71) at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:58) at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:72) at org.apache.avro.ipc.Requestor.request(Requestor.java:147) at org.apache.avro.ipc.Requestor.request(Requestor.java:101) at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:72)

Highlighted

Re: How to notify when one of the services or roles go down

New Contributor

Hi Nishan,

 

How did you solve this issue, I have the same problem.

 

Thanks in advance.

Senthi

 

Highlighted

Re: How to notify when one of the services or roles go down

Champion Alumni

Hey Senthi,

 

Can you restart your  cloudera scm agents and try again?

 

Thanks,

Nisha

Highlighted

Re: How to notify when one of the services or roles go down

I see same Issue. Restarting Cloudera agents did not help me.  I got Test mail but never got any other alerts yet. Is there anything I am missing. Below is my log

 

 

8:41:02.635 AM INFO com.cloudera.enterprise.alertpublisher.AlertPublisher
Starting Alert Publisher. Version: 5.0.2 (#297 built by jenkins on 20140606-2221 git: 80907df78ba6b50c21a598f0caff8b00685d5961)
8:41:02.757 AM INFO com.cloudera.enterprise.Translator
..............


8:41:02.919 AM INFO com.cloudera.enterprise.Translator
Loading bundle 'activity' from 'activity_pt_PT.properties'
8:41:02.951 AM WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry
Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[ALERTPUBLISHER], CATEGORY=[LOG_MESSAGE], ROLE=[mgmt-ALERTPUBLISHER-820b513ee0845b4ad566b8abfb7e01a5], SEVERITY=[IMPORTANT], SERVICE=[mgmt], HOST_IDS=[c16b1719-782a-4313-bb1a-361cabf80c08], SERVICE_TYPE=[MGMT], LOG_LEVEL=[WARN], HOSTS=[uxlab231.xyz.com], EVENTCODE=[EV_LOG_EVENT]}, content=Translations for locale [it] missing., timestamp=1431438062795}
8:41:02.986 AM INFO com.cloudera.enterprise.Translator
Loading bundle 'activity' from 'activity_pt_BR.properties'
........................


8:41:03.160 AM INFO com.cloudera.enterprise.Translator
Loading bundle 'message.metrics' from 'message.metrics_ko.properties'
8:41:03.168 AM INFO com.cloudera.enterprise.Translator
Loading bundle 'message.metrics' from 'message.metrics_zh_CN.properties'
8:41:03.206 AM INFO org.apache.camel.impl.MainSupport
Apache Camel 2.7.2 starting
8:41:03.402 AM INFO com.cloudera.enterprise.alertpublisher.AlertPublisher$AlertPublisherRouteBuilder
Setting up html email notification with mail server 'smtp://localhost:25?from=noreply%40localhost&to=root%40localhost%2CReddyS2%40xyz.com', mail prefix '[Cloudera Alert]' and max length of email subject '80', hostname 'uxlab230.xyz.com', email header 'null', email footer 'null'
8:41:03.417 AM INFO com.cloudera.enterprise.EnterpriseService
Starting AvroAlertPublisherServer
8:41:03.433 AM INFO org.mortbay.log
Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
8:41:03.536 AM INFO org.mortbay.log
jetty-6.1.26.cloudera.2
8:41:03.568 AM INFO org.mortbay.log
Started SocketConnector@0.0.0.0:10101
8:41:03.568 AM INFO com.cloudera.enterprise.alertpublisher.AvroInternalAlertPublisherAPIServer
Running AvroAlertPublisherServer on port 10101
8:41:03.569 AM INFO org.apache.camel.impl.DefaultCamelContext
Apache Camel 2.7.2 (CamelContext: camel-1) is starting
8:41:03.569 AM INFO org.apache.camel.impl.DefaultCamelContext
JMX is disabled. Using DefaultManagementStrategy.
8:41:04.029 AM INFO org.apache.camel.impl.converter.AnnotationTypeConverterLoader
Found 4 packages with 14 @Converter classes to load
8:41:04.075 AM INFO org.apache.camel.impl.converter.DefaultTypeConverter
Loaded 152 type converters in 0.481 seconds
8:41:04.276 AM INFO org.apache.camel.processor.aggregate.AggregateProcessor
Using CompletionInterval to run every 60000 millis.
8:41:04.329 AM INFO org.apache.camel.impl.DefaultCamelContext
Route: route1 started and consuming from: Endpoint[seda://events]
8:41:04.330 AM INFO org.apache.camel.impl.DefaultCamelContext
Route: route2 started and consuming from: EventStoreEndpoint{URI=event://uxlab231.xyz.com:7184?eventStoreHttpPort=7185&eventsQueryTimeoutMillis=60000, embedded=false, alertsOnly=false, pollIntervalSecs=10}
8:41:04.330 AM INFO org.apache.camel.impl.DefaultCamelContext
Route: route3 started and consuming from: Endpoint[seda://alerts]
8:41:04.331 AM INFO org.apache.camel.impl.DefaultCamelContext
Total 3 routes, of which 3 is started.
8:41:04.331 AM INFO org.apache.camel.impl.DefaultCamelContext
Apache Camel 2.7.2 (CamelContext: camel-1) started in 0.761 seconds
8:41:04.413 AM WARN org.apache.camel.impl.DefaultPollingConsumerPollStrategy
Consumer Consumer[event://uxlab231.xyz.com:7184?eventStoreHttpPort=7185&eventsQueryTimeoutMillis=60000] could not poll endpoint: event://uxlab231.xyz.com:7184?eventStoreHttpPort=7185&eventsQueryTimeoutMillis=60000 caused by: java.net.ConnectException: Connection refused
org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)
at com.sun.proxy.$Proxy9.queryEvents(Unknown Source)
at com.cloudera.cmf.event.query.AvroEventStoreQueryProxy.doQuery(AvroEventStoreQueryProxy.java:160)
at com.cloudera.enterprise.alertpublisher.component.EventStoreConsumer.poll(EventStoreConsumer.java:167)
at org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:97)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:378)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:473)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:203)
at sun.net.www.http.HttpClient.New(HttpClient.java:290)
at sun.net.www.http.HttpClient.New(HttpClient.java:306)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:995)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:931)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:849)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1090)
at org.apache.avro.ipc.HttpTransceiver.writeBuffers(HttpTransceiver.java:71)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:58)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:72)
at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:72)
... 12 more
8:41:14.331 AM WARN org.apache.camel.impl.DefaultPollingConsumerPollStrategy
Consumer Consumer[event://uxlab231.xyz.com:7184?eventStoreHttpPort=7185&eventsQueryTimeoutMillis=60000] could not poll endpoint: event://uxlab231.xyz.com:7184?eventStoreHttpPort=7185&eventsQueryTimeoutMillis=60000 caused by: java.net.ConnectException: Connection refused
org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)
at com.sun.proxy.$Proxy9.queryEvents(Unknown Source)
at com.cloudera.cmf.event.query.AvroEventStoreQueryProxy.doQuery(AvroEventStoreQueryProxy.java:160)
at com.cloudera.enterprise.alertpublisher.component.EventStoreConsumer.poll(EventStoreConsumer.java:167)
at org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:97)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:378)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:473)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:203)
at sun.net.www.http.HttpClient.New(HttpClient.java:290)
at sun.net.www.http.HttpClient.New(HttpClient.java:306)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:995)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:931)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:849)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1090)
at org.apache.avro.ipc.HttpTransceiver.writeBuffers(HttpTransceiver.java:71)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:58)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:72)
at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:72)
... 12 more