Support Questions
Find answers, ask questions, and share your expertise

Ambari is taking long time to restart slave services on a particular server in the cluster

Rising Star

Hi,

We have a cluster of 10 servers.

One worker server among them has the problem while starting/stopping services from Ambari.

When I try to invoke any operations on any service (HDFS/HBASE/METRICS) from Ambari, the command is taking very long time to execute.

I searched Ambari logs, Servcie logs but could not find any error.

I tried to restart Ambari server and Ambari agent but still no luck.

I had the same problem earlier but reinstalling ambari-agent fixed the issue but no luck now.

I deleted host from cluster, cleaned total server and added back to server but still the same issue.

Please advice.

Thanks,

Venkat

1 ACCEPTED SOLUTION

Accepted Solutions

Rising Star

I tried to manually start/stop from problematic node and it was very quick. Below are the commands used

/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/current/hbase-regionserver/conf start/stop regionserver

When the regionserver stopped manually, Ambari is notified about the change and reduced the no of regionserver count and increased the count when started the regionserver manually. So, it is confirmed that the problem is only when starting/stopping from the Ambari but no logs related to the delay either in ambar-server.log or in ambari-agent.logs

View solution in original post

7 REPLIES 7

@Venkata Sridhar Gangavarapu What's in the ambari logs?

Mentor

@Venkata Sridhar Gangavarapu

doublecheck the mountpoints, permissions, firewall rules, config across nodes. You may have an inconsistency

Rising Star

I checked in server and agent logs but I did not see anything when I invoked the start/stop from Ambari from the problematic server

Rising Star

I tried to manually start/stop from problematic node and it was very quick. Below are the commands used

/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/current/hbase-regionserver/conf start/stop regionserver

When the regionserver stopped manually, Ambari is notified about the change and reduced the no of regionserver count and increased the count when started the regionserver manually. So, it is confirmed that the problem is only when starting/stopping from the Ambari but no logs related to the delay either in ambar-server.log or in ambari-agent.logs

View solution in original post

Mentor

@Venkata Sridhar Gangavarapu are ambari-server and agent on the same version?

New Contributor

Hi,

I am having the same problem on two Ambari managed clusters. In both the problematic node has both Ambari server and Ambari agent.

Start/Restart/Stop are taking too much time to get executed.

A restart of whole service (for example YARN) is problematic since all nodes finish restarted except the slow node which is still restarting in the first or the second component.

Explorer

Oh yeah! We too hit this problem on our Sandbox cluster running HDP 2.5.3 with Kerberos + AD integrated secure cluster. We can see that Ambari server is not sending the commands to Ambari agent (or) is hung on something with Ambari agent... Just the gear icon in the settings with 3 small icons never appear.... Man... Such a pain, this one is. Today we restarted the services and it took 2 hours for a 4-node cluster....

Hadoop means Elephant, right? Now, I believe it.... for a totally different reason though.... 🙂