Support Questions

samuel_sayag · ‎12-28-2016

Hello,

I have a cluster with HDP 2.5 installed. Hbase master start and shutdown after 5 min (300000ms) with a stack that seems different than the other stacks encountered in the other messages of this forum. That's why I decided to post this:

Here are the errors that fill the master logs:

2016-12-28 12:21:57,531 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=854.48 KB, freeSize=811.72 MB, max=812.55 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=29, evicted=0, evictedPerRun=0.0
2016-12-28 12:22:05,269 FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Failed to become active master
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863)
        at java.lang.Thread.run(Thread.java:745)
2016-12-28 12:22:05,271 FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.backup.master.BackupController]
2016-12-28 12:22:05,271 FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863)
        at java.lang.Thread.run(Thread.java:745)
2016-12-28 12:22:05,271 INFO  [cluster1-node4:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.

Hbase master log is here: hbase-master.txt

Thanks for any help that may be provided

skurup · ‎12-28-2016

Its kind of a corner case issue that large hbase clusters usually do encounter . This was also addressed to the Ambari team to get these values corrected and overridden to a fairly moderate value . Please refer to the link below where Ted Yu has mentioned these values to be updated:

https://issues.apache.org/jira/browse/AMBARI-16278

Along with what was suggested by @gsharma , please also update the hbase.regionserver.executor.openregion.threads to "20" and then test it again. This will ensure that the number of concurrent threads to process opening of regions is more and will help in faster initialization of the regions. Again , the values in Ambari are set as per the right threshold and in most of the cases , this will work fine. Only in corner case , do we run into "initialization" of the "namespace" issue . Even after that if it continues, please look at the specific regionserver logs where the "namespace" region is getting assigned to see what issues/ errors that you would encounter.

View solution in original post

gsharma · ‎12-28-2016

what is the other activity happening before this exception ? Is splitting of WAL taking place ? Have you checked NN health and its logs during same time duration ?

gsharma · ‎12-28-2016

Try increasing hbase.master.namespace.init.timeout to a bigger value say 2400000

samuel_sayag · ‎12-28-2016

No specific activity, I am restarting the HBase services on ambari simply.

I am sorry I don't know what is NN health.

I don't think the timeout is caused by slow start but by :

FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Failed to become active masterjava.io.IOException: Timedout 300000ms waiting for namespace table to be assigned        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104)        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061)        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840)        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213)        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863)        at java.lang.Thread.run(Thread.java:745)

Your advice : increasing 2400000ms => 40 minutes ???

I am not really ready to let him try 40 minutes before giving before it reports again about its impossibility "for namespace to be assigned"

skurup · ‎12-28-2016

Its kind of a corner case issue that large hbase clusters usually do encounter . This was also addressed to the Ambari team to get these values corrected and overridden to a fairly moderate value . Please refer to the link below where Ted Yu has mentioned these values to be updated:

https://issues.apache.org/jira/browse/AMBARI-16278

Along with what was suggested by @gsharma , please also update the hbase.regionserver.executor.openregion.threads to "20" and then test it again. This will ensure that the number of concurrent threads to process opening of regions is more and will help in faster initialization of the regions. Again , the values in Ambari are set as per the right threshold and in most of the cases , this will work fine. Only in corner case , do we run into "initialization" of the "namespace" issue . Even after that if it continues, please look at the specific regionserver logs where the "namespace" region is getting assigned to see what issues/ errors that you would encounter.

samuel_sayag · ‎12-29-2016

It solves the problem for a cluster with HBase Master = 1 and Region Server = 16

@Sumesh

However hbase.regionserver.executor.openregion.threads = 20 didn't solve the problem. So I read the ambari issue associated and it is advised to take 200 NOT 20

Thanks for the help

Cloudera Community

Support Questions

HBase master start and shutdown after 5min