Created 01-30-2016 07:48 AM
I have a cluster that has been running pretty smoothly for over a month now that is rather large. I had just fixed an issue with a few DataNodes and was left with the web interface saying that 5 hosts were in maintenance mode when they were not. I tried turning mm on/off but it didn't help so I finally tried a sudo ambari-server restart
Afterwards, I tried to login again and the web interface wouldn't load. Waited a few minutes, still nada.
Looking at today's entries in /var/log/ambari-server/ambari-server.log had errors regarding the metric server but nothing that seemed relevant. The network settings are all the same and haven't changed, I am not sure what went wrong? Any ideas of what logs I should read or what I should try?
EDIT:
Thanks for the fast replies!!
I have attached the info requested as files so this post doesn't extend down a mile.
It looks like the ambari service is not listening on port 8080.
EDIT2:
I ended up rolling back the ambari server vm to an earlier checkpoint and luckily there weren't too many errors to deal with. I will update again if the issue comes back.
Created 01-30-2016 08:37 AM
Could you also restart the ambari agents of your cluster. It'd be good to have some of the errors that you saw in the ambari-server log, could you post some of the log or upload the log file?
After you restarted the ambari server, did it show up in the process (ps -aux | grep ambari-server), also make sure you see the ambari process listening on port 8080 (netstat -anop)
Created 01-31-2016 02:51 AM
@Kyle Pifer Do you have access to support?
Is it production env?
Ambari Server | Ambari Server host | 8440 | https | Handshake Port for Ambari Agents to Ambari Server |
Created 01-31-2016 02:56 AM
It is non-production in a research lab. Haven't signed up for support yet either.
Created 01-30-2016 11:40 PM
@Kyle Pifer please start ambari server with debug switch and analyze the output. Link
Created 01-31-2016 01:33 AM
I ran:
sudo ambari-server stop
sudo ambari-server start -v -g
But no errors were displayed and the ambari-server.log was the same.
On one of the nodes, I restarted the agent and checked the log and it is indicating:
Failed to connect to AmbariController:8440/ca due to [Errorno 111] Connection refused
Created 01-31-2016 01:41 AM
Turn off firewall and try again. @Kyle Pifer
Created 01-31-2016 01:51 AM
Didnt help =(
Created 01-31-2016 01:57 AM
@Kyle Pifer follow this Guide first and if no results go through this troubleshooting Guide if still no result, open support case and if that's not an option upgrade to the latest Ambari supported by your HDP version as Ambari 2.2 only works with HDP 2.3+
Created 01-31-2016 02:47 AM
Unfortunately it is not a UI issue as the web server doesn't appear to be running as port 8080 never opens to a listening state. I didn't see anything in the troubleshooting guides aside from trying to restart the ambari-server which hasn't helped. Also this was deployed using the latest Ambari 2.2 and the HDP 2.3 stack.
I originally installed the ambari server on a vm so that I could checkpoint it and luckily I had recently taken one and was able to roll back successfully without too many errors.
Created 01-31-2016 02:56 AM
@Kyle Pifer great you had a backup plan but it would've been great knowing the root cause. Cheers.
BTW, please convert your last comment to answer so we could use that as a best answer and close out out the thread.
Created 01-31-2016 03:01 AM
The next time I reboot it I will log better information and see if it reoccurs for better RCA. Thanks for all the help!