Created 12-28-2016 11:48 AM
Hello,
I have a cluster with HDP 2.5 installed. Hbase master start and shutdown after 5 min (300000ms) with a stack that seems different than the other stacks encountered in the other messages of this forum. That's why I decided to post this:
Here are the errors that fill the master logs:
2016-12-28 12:21:57,531 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=854.48 KB, freeSize=811.72 MB, max=812.55 MB, blockCount=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=29, evicted=0, evictedPerRun=0.0 2016-12-28 12:22:05,269 FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Failed to become active master java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863) at java.lang.Thread.run(Thread.java:745) 2016-12-28 12:22:05,271 FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.backup.master.BackupController] 2016-12-28 12:22:05,271 FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863) at java.lang.Thread.run(Thread.java:745) 2016-12-28 12:22:05,271 INFO [cluster1-node4:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.
Hbase master log is here: hbase-master.txt
Thanks for any help that may be provided
Created 12-28-2016 11:03 PM
Its kind of a corner case issue that large hbase clusters usually do encounter . This was also addressed to the Ambari team to get these values corrected and overridden to a fairly moderate value . Please refer to the link below where Ted Yu has mentioned these values to be updated:
https://issues.apache.org/jira/browse/AMBARI-16278
Along with what was suggested by @gsharma , please also update the hbase.regionserver.executor.openregion.threads to "20" and then test it again. This will ensure that the number of concurrent threads to process opening of regions is more and will help in faster initialization of the regions. Again , the values in Ambari are set as per the right threshold and in most of the cases , this will work fine. Only in corner case , do we run into "initialization" of the "namespace" issue . Even after that if it continues, please look at the specific regionserver logs where the "namespace" region is getting assigned to see what issues/ errors that you would encounter.
Created 12-28-2016 11:59 AM
what is the other activity happening before this exception ? Is splitting of WAL taking place ? Have you checked NN health and its logs during same time duration ?
Created 12-28-2016 12:09 PM
Try increasing hbase.master.namespace.init.timeout to a bigger value say 2400000
Created 12-28-2016 12:30 PM
No specific activity, I am restarting the HBase services on ambari simply.
I am sorry I don't know what is NN health.
I don't think the timeout is caused by slow start but by :
FATAL [cluster1-node4:16000.activeMasterManager] master.HMaster: Failed to become active masterjava.io.IOException: Timedout 300000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1061) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:840) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863) at java.lang.Thread.run(Thread.java:745)
Your advice : increasing 2400000ms => 40 minutes ???
I am not really ready to let him try 40 minutes before giving before it reports again about its impossibility "for namespace to be assigned"
Created 12-28-2016 11:03 PM
Its kind of a corner case issue that large hbase clusters usually do encounter . This was also addressed to the Ambari team to get these values corrected and overridden to a fairly moderate value . Please refer to the link below where Ted Yu has mentioned these values to be updated:
https://issues.apache.org/jira/browse/AMBARI-16278
Along with what was suggested by @gsharma , please also update the hbase.regionserver.executor.openregion.threads to "20" and then test it again. This will ensure that the number of concurrent threads to process opening of regions is more and will help in faster initialization of the regions. Again , the values in Ambari are set as per the right threshold and in most of the cases , this will work fine. Only in corner case , do we run into "initialization" of the "namespace" issue . Even after that if it continues, please look at the specific regionserver logs where the "namespace" region is getting assigned to see what issues/ errors that you would encounter.
Created 12-29-2016 07:47 AM
It solves the problem for a cluster with HBase Master = 1 and Region Server = 16
However hbase.regionserver.executor.openregion.threads = 20 didn't solve the problem. So I read the ambari issue associated and it is advised to take 200 NOT 20
Thanks for the help