I have a Cluster with 5 Nodes based on SLES 11.4 and HDP 2.4. Each node has 8GB RAM
We had some network problems last week as every port on every node has to be opened in the firewall.
Now everything seem to be fixed.
However, ambari shows 31 critical alerts for many services. Basically for every UI (YARN, Falcon, Hbase, and so on) there is a timeout alert.
The UIs are well reachable from client laptops in the company and also with wget from the node where ambari is running.
No service seems to have problems.
I notice that the field "24-Hour" has always the value "0". The alerts are critical for 7 days.
What does the field "24-Hour" mean???
What is the issue with the alerts? Do I have to reset them?
Thanks in advance!
Thanks for the answer, but the 2nd link regards an issue with HDP 2.3 that should have been fixed with 2.4. I have this problem with HDP 2.4 on SLES 11.4.
The cluster is OK, but Ambari insists that there is a timeout in checking the UIs: Atlas UI, Nodemanager UI, Datanode UI, Oozie UI, etc.
Is it possible that the URL-checking-scripts don't get the operating-system-proxy and firewall configurations.
It could be an answer, because the UI-URLs are reachable using both a browser and wget.
I believe that the 24-hour column indicates how many alert state changes have occurred in a 24-hour period. As far as the alerts go, each host is responsible for running its own alert. So, if there is an alert which checks the NameNode Web UI, it's going to run on the NameNode host.
You had mentioned a proxy; The Ambari agents won't follow a linux proxy when making web requests in a Kerberized environment. They might in an un-Kerberized cluster but it's not something that I would normally expect them to do.
You should ensure that the web UIs are accessible from the host in which they are running using a command like curl.
The original description didn't mention an HDP 2.3 to HDP 2.4 upgrade. How did you upgrade; did you use Ambari's rolling / express upgrade options?
The HDP stack doesn't affect how the alerts work - they're doing the same thing no matter what the stack is that you're running on.