I had only one Master on Name Node 1.
HBASE was working correctly, it was green.
After that I added a backup Master to a different host (Name Node 2) than the one on which the active Master is running.
So I hade Master (Active) and Master (Backup), both are Active.
They are up and running, but after some time I see that HBASE is red, because both 2 Masters are down.
Why is this? I just added one additional role instance for Master to be on second node.
How to resolve this? If I restart them everything is up and running but after some time again they are down.
Errors which I can see are:
Master Health Suppress...Master summary: nn1 (Availability: Unknown, Health: Bad), nn2 (Availability: Unknown, Health: Bad).
This health test is bad because the Service Monitor did not find an active Master.
This role's process exited. This role is supposed to be started.
FATAL HMasterUnhandled exception. Starting shutdown.org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-901347749-10.0.9.4-1538722352079:blk_1073741826_1002 file=/hbase/hbase.version
HMasterFailed to become active master
WARN DFSClientFailed to connect to /10.0.9.6:50010 for block BP-901347749-10.0.9.4-1538722352079:blk_1073741826_1002, add to deadNodes and continue.java.net.ConnectException: Connection refused
WARN BlockReaderFactoryI/O error constructing remote block reader.java.net.ConnectException: Connection refused
HBase Master Health Suppress...Master summary: nn1 (Availability: Unknown, Health: Bad), nn2 (Availability: Unknown, Health: Bad). This health test is bad because the Service Monitor did not find an active Master.
You can try running the below command and check this fixes the issue.
sudo -u hbase hbase hbck -fixVersionFile