Created 01-28-2016 07:17 PM
Hi,
We have a cluster of 10 servers.
One worker server among them has the problem while starting/stopping services from Ambari.
When I try to invoke any operations on any service (HDFS/HBASE/METRICS) from Ambari, the command is taking very long time to execute.
I searched Ambari logs, Servcie logs but could not find any error.
I tried to restart Ambari server and Ambari agent but still no luck.
I had the same problem earlier but reinstalling ambari-agent fixed the issue but no luck now.
I deleted host from cluster, cleaned total server and added back to server but still the same issue.
Please advice.
Thanks,
Venkat
Created 01-28-2016 10:42 PM
I tried to manually start/stop from problematic node and it was very quick. Below are the commands used
/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/current/hbase-regionserver/conf start/stop regionserver
When the regionserver stopped manually, Ambari is notified about the change and reduced the no of regionserver count and increased the count when started the regionserver manually. So, it is confirmed that the problem is only when starting/stopping from the Ambari but no logs related to the delay either in ambar-server.log or in ambari-agent.logs
Created 01-28-2016 08:10 PM
@Venkata Sridhar Gangavarapu What's in the ambari logs?
Created 01-28-2016 08:30 PM
doublecheck the mountpoints, permissions, firewall rules, config across nodes. You may have an inconsistency
Created 01-28-2016 08:47 PM
I checked in server and agent logs but I did not see anything when I invoked the start/stop from Ambari from the problematic server
Created 01-28-2016 10:42 PM
I tried to manually start/stop from problematic node and it was very quick. Below are the commands used
/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/current/hbase-regionserver/conf start/stop regionserver
When the regionserver stopped manually, Ambari is notified about the change and reduced the no of regionserver count and increased the count when started the regionserver manually. So, it is confirmed that the problem is only when starting/stopping from the Ambari but no logs related to the delay either in ambar-server.log or in ambari-agent.logs
Created 01-28-2016 10:47 PM
@Venkata Sridhar Gangavarapu are ambari-server and agent on the same version?
Created 03-06-2017 11:53 AM
Hi,
I am having the same problem on two Ambari managed clusters. In both the problematic node has both Ambari server and Ambari agent.
Start/Restart/Stop are taking too much time to get executed.
A restart of whole service (for example YARN) is problematic since all nodes finish restarted except the slow node which is still restarting in the first or the second component.
Created 03-13-2018 09:34 AM
Oh yeah! We too hit this problem on our Sandbox cluster running HDP 2.5.3 with Kerberos + AD integrated secure cluster. We can see that Ambari server is not sending the commands to Ambari agent (or) is hung on something with Ambari agent... Just the gear icon in the settings with 3 small icons never appear.... Man... Such a pain, this one is. Today we restarted the services and it took 2 hours for a 4-node cluster....
Hadoop means Elephant, right? Now, I believe it.... for a totally different reason though.... 🙂