Hello,
I came across a particular issue.
The active NameNode (master01) returned a socket timeout on zkfc, soon after he performed automatically failover bringing master02 to active. But master01 remained in a stalemate, on Ambari NameNode could see up, but without a state (active or stand-by); the process of NameNode on server was up and answered the call.
On NameNode log there are no errors, on zkfc log there are some SocketTimeout (I attached the log).
For resolve this situation we had to restart the NameNode service on the master01, which is automatically left in stand-by just started.
Then I tried to do many manual failover and have positive results all time. On system log no have error, lan is always up and no have error for communicate with server.
As I wrote above, the NameNode service is up and running on all 2 server.
Have an idea of what might have happened?
PS:HDFS service work correctly
nn-errors.txt