Support Questions

Find answers, ask questions, and share your expertise

HBase master start and shutdown after 5min

avatar
Contributor

Hello,

I have a cluster with HDP 2.5 installed. Hbase master start and shutdown after 5 min (300000ms) with a stack that seems different than the other stacks encountered in the other messages of this forum. That's why I decided to post this:

Here are the errors that fill the master logs:

2016-12-28 12:21:57,531 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=854.48 KB, freeSize=811.72 MB, max=812.55 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=29, evicted=0, evictedPerRun=0.0
2016-12-28 12:22:05,269 FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Failed to become active master
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863)
        at java.lang.Thread.run(Thread.java:745)
2016-12-28 12:22:05,271 FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.backup.master.BackupController]
2016-12-28 12:22:05,271 FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863)
        at java.lang.Thread.run(Thread.java:745)
2016-12-28 12:22:05,271 INFO  [cluster1-node4:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.

Hbase master log is here: hbase-master.txt

Thanks for any help that may be provided

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Its kind of a corner case issue that large hbase clusters usually do encounter . This was also addressed to the Ambari team to get these values corrected and overridden to a fairly moderate value . Please refer to the link below where Ted Yu has mentioned these values to be updated:

https://issues.apache.org/jira/browse/AMBARI-16278

Along with what was suggested by @gsharma , please also update the hbase.regionserver.executor.openregion.threads to "20" and then test it again. This will ensure that the number of concurrent threads to process opening of regions is more and will help in faster initialization of the regions. Again , the values in Ambari are set as per the right threshold and in most of the cases , this will work fine. Only in corner case , do we run into "initialization" of the "namespace" issue . Even after that if it continues, please look at the specific regionserver logs where the "namespace" region is getting assigned to see what issues/ errors that you would encounter.

View solution in original post

5 REPLIES 5

avatar

what is the other activity happening before this exception ? Is splitting of WAL taking place ? Have you checked NN health and its logs during same time duration ?

avatar

Try increasing hbase.master.namespace.init.timeout to a bigger value say 2400000

avatar
Contributor

No specific activity, I am restarting the HBase services on ambari simply.

I am sorry I don't know what is NN health.

I don't think the timeout is caused by slow start but by :

FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Failed to become active masterjava.io.IOException: Timedout 300000ms waiting for namespace table to be assigned        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061)        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840)        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213)        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863)        at java.lang.Thread.run(Thread.java:745)

Your advice : increasing 2400000ms => 40 minutes ???

I am not really ready to let him try 40 minutes before giving before it reports again about its impossibility "for namespace to be assigned"

avatar
Super Collaborator

Its kind of a corner case issue that large hbase clusters usually do encounter . This was also addressed to the Ambari team to get these values corrected and overridden to a fairly moderate value . Please refer to the link below where Ted Yu has mentioned these values to be updated:

https://issues.apache.org/jira/browse/AMBARI-16278

Along with what was suggested by @gsharma , please also update the hbase.regionserver.executor.openregion.threads to "20" and then test it again. This will ensure that the number of concurrent threads to process opening of regions is more and will help in faster initialization of the regions. Again , the values in Ambari are set as per the right threshold and in most of the cases , this will work fine. Only in corner case , do we run into "initialization" of the "namespace" issue . Even after that if it continues, please look at the specific regionserver logs where the "namespace" region is getting assigned to see what issues/ errors that you would encounter.

avatar
Contributor

It solves the problem for a cluster with HBase Master = 1 and Region Server = 16

@Sumesh

However hbase.regionserver.executor.openregion.threads = 20 didn't solve the problem. So I read the ambari issue associated and it is advised to take 200 NOT 20

Thanks for the help