Reply
Contributor
Posts: 121
Registered: ‎10-15-2014

hbase Master failed to become active master

My primary master died due to scheduled maintenance and my backup master failed to kick in 

CM Agent has tried to start it up but could not initialize the namespace table 

After several manual efforts where I followed https://community.cloudera.com/t5/Storage-Random-Access-HDFS/HBase-Master-Failed-to-become-active-ma...

 

I did the following only 

rmr /hbase/meta-region-server
rmr /hbase/rs
rmr /hbase/splitWAL
rmr /hbase/backup-masters
rmr /hbase/table-lock
rmr /hbase/flush-table-proc
rmr /hbase/region-in-transition
rmr /hbase/running
rmr /hbase/balancer 
rmr /hbase/recovering-regions
 rmr /hbase/draining
 rmr /hbase/namespace 
 rmr /hbase/hbaseid
 rmr /hbase/table

I got the a master to come up after setting hbase.master.namespace.init.timeout to some absurd value

I see the master registering dead region servers (though I cannot find where it pick them up, not in the WAL, Archive or data)

and I see the master registering the following 

Starting namespace manager (since 1hrs, 20mins, 5sec ago)

even though cloudera manager shows healthy

 

list the catalog in hbase shell gives me the following error

 

hbase(main):004:0> list
TABLE

ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
    at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2373)
    at org.apache.hadoop.hbase.master.MasterRpcServices.getTableNames(MasterRpcServices.java:907)
    at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55650)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2182)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)

fsck /hbase -files -blocks shows healthy

hbck  shows zero inconsistencies

 

I am on version 

Hadoop 2.6.0+cdh5.11.1+2400
HBase 1.2.0+cdh5.11.1+319

I did have a master colocated with a region server and was wondering if I ran into this

https://issues.apache.org/jira/browse/HBASE-14861

and then this 

https://issues.apache.org/jira/browse/HBASE-14664

as the cause of the failed backup kicking in 

 

But I cannot determine why I am getting namespace manager would take so long to initialize

 

Contributor
Posts: 121
Registered: ‎10-15-2014

Re: hbase Master failed to become active master

add on https://issues.apache.org/jira/browse/HBASE-16488 

and master is still starting 

Starting namespace manager (since 1hrs, 55mins, 57sec ago)

Contributor
Posts: 121
Registered: ‎10-15-2014

Re: hbase Master failed to become active master

Another update

left master alone to see if it can resolve

Starting namespace manager (since 11hrs, 55mins, 3sec ago)

unfortunately I cannot find any logs on what exactly its trying to do

 

Explorer
Posts: 11
Registered: ‎07-10-2014

Re: hbase Master failed to become active master

Wish we could have fixed this for you... The only ideas I had involved some serious low level debugging / tracing while it was stuck in this state. Hope things are better now. 

Announcements