Created 10-04-2017 09:48 PM
Created 10-04-2017 11:44 PM
Sometimes even though a service is running, there could be alerts which are triggering for it. Some examples include a service which is technically up but has a 500 error on its web management page. Or perhaps a metric alert is firing because of a value which is outside of the range of acceptable values.
Can you please provide more context into which alerts are present?
In some older versions of Ambari, alerts would remain for hosts/service which were removed. So if you're using an older version it could be that as well.
Created 10-05-2017 03:35 PM
Thanks for your reply, here below is the detailed analysis
am getting this following alerts
Connection failed to http://hostname:8042 (<urlopen error timed out>)
JournalNode Web UI
NodeManager Web UI
Oozie Server Web UI
History Server Web UI
Percent JournalNodes Available
App Timeline Web UI
WebHCat Server Status
Falcon Server Web UI
Metrics Collector - Auto-Restart Status
All the web UI are working fine but don't why Ambari is still showing alerts, even i did telnet with port numbers it showing as connected. This one is test cluster and has no firewall restrictions.
Created 10-05-2017 05:14 PM
Let's take the first alert: Connection failed to http://hostname:8042 (<urlopen error timed out>)
Remember that this is being run from the specific host to itself. So, in order to verify that things work, you'd need to first login to that host, and then run wget from that host to the FQDN in the error above. Also, check your environment for any possible proxy settings, like "export http_proxy"
Created 10-05-2017 05:18 PM
Hi Jonathan, could you please provide me steps what exactly i have to do. Using wget what should i do?
Created 10-05-2017 06:33 PM
You should wget the address specified in the error; the http://hostname:8042 address.
The alerts run on their respective hosts. If the alert is for a NodeManager, then for every NodeManager in your system, each one will attempt to connect to it's own FQDN.
Created 10-05-2017 07:32 PM
I got the below response after doing wget Jonathan.
Connecting to abcd207.lport.net|10.4.13.10|:8042... connected. HTTP request sent, awaiting response... 302 Found Location: http://abcd207.lport.net:8042/node [following] --2017-10-05 19:26:12-- http://abcd207.lport.net:8042/node Reusing existing connection to abcd207.lport.net:8042. HTTP request sent, awaiting response... 200 OK Length: 6184 (6.0K) [text/html] Saving to: “index.html” 100%[=============================================================>] 6,184 --.-K/s in 0s 2017-10-05 19:26:12 (512 MB/s) - “index.html” saved [6184/6184]