Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

SSH Fence not working?

avatar
Rising Star

So SSHFence never seem to have worked for me with failover activated.

I enabled sshfence, made an hdfs user on ambari, generated an ssh-keygen key for passwordless session, manually tested the said ssh passwordless connection.... everythinjg to have been set yet still not working... whenever one of my namenode failed over in the backend ambari falsely reported both as active.

so I went to default and used the following script as its the only way I can get my primary nn to stay active and secondary as standby.

5723-not-working.png

5722-failover-script.png

Essentially the script pings the NN's every minute and if they respond it checks for the current status and forces them into an active:standby state.

Theoretically having the snn as active and nn as standby should be fine, however on ambari it never reports the status correctly and unless I force transition the nodes it doesnt report them active:standby and the hdfs://ClusterName fails to work....

If someone has a better solution I'd love to hear it....

For those wondering I'm running on Ambari 2.2.2.0 and HDP 2.4.2.0 on a CentOS 6 x64 environment.

Additionally looking at the documentation it implies creating a user and making and ssh script to run fencing approach... what I dont get is what is the point of running a said script if nn complains that "failover is activated... you cannot manually failover the nodes" or something along that line.

There's something I'm definitely missing.

Anyhow the solution above has been working for me but it doesnt feel clean and I'd like to know how to community handles HA and what scripting approach you use....

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login