Support Questions

Find answers, ask questions, and share your expertise

Cannot restore hbase.id and hbase.version

avatar
Explorer

Recently we have deploy a new hdp cluster into 3 master and 5 worker nodes.
But it looks like there are several component that mis-placed such as Hbase service.
A Hbase master has been installed into a worker node and 3 Hbase Region Servers is located on 3 master nodes.

So, we've a plan to relocate the some mis-placed component of hbase (the hbase master must be in master node and the region servers mus be in worker nodes).
Our first step is to  decomissioned the 3 hbase region server on masternode but it cannot be done because the hbase master failed to be started

Then we investigate the hbase master log and we found

 

2021-11-30 13:42:24,379 ERROR [master/DCHDPD03:16000:becomeActiveMaster] master.HMaster: Failed to become active master
org.apache.hadoop.hbase.util.FileSystemVersionException: hbase.version file is missing. Is your hbase.rootdir valid? You can restore hbase.version file by running 'HBCK2 filesystem -fix'. See https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
	at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:452)
	at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:275)
	at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153)
	at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:124)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:865)
	at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2267)
	at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:586)
	at java.lang.Thread.run(Thread.java:745)
2021-11-30 13:42:24,380 ERROR [master/DCHDPD03:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master dchdpd03.dcdms,16000,1638254541977: Unhandled exception. Starting shutdown. *****
org.apache.hadoop.hbase.util.FileSystemVersionException: hbase.version file is missing. Is your hbase.rootdir valid? You can restore hbase.version file by running 'HBCK2 filesystem -fix'. See https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
	at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:452)
	at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:275)
	at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153)
	at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:124)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:865)
	at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2267)
	at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:586)
	at java.lang.Thread.run(Thread.java:745)
2021-11-30 13:42:24,380 INFO  [master/DCHDPD03:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'dchdpd03.dcdms,16000,1638254541977' *****
2021-11-30 13:42:24,380 INFO  [master/DCHDPD03:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/DCHDPD03:16000:becomeActiveMaster
2021-11-30 13:42:27,268 INFO  [master/DCHDPD03:16000] ipc.NettyRpcServer: Stopping server on /10.0.45.16:16000
2021-11-30 13:42:27,275 INFO  [master/DCHDPD03:16000] regionserver.HRegionServer: Stopping infoServer
2021-11-30 13:42:27,282 INFO  [master/DCHDPD03:16000] handler.ContextHandler: Stopped o.e.j.w.WebAppContext@47ac613b{/,null,UNAVAILABLE}{file:/usr/hdp/3.1.5.0-152/hbase/hbase-webapps/master}
2021-11-30 13:42:27,287 INFO  [master/DCHDPD03:16000] server.AbstractConnector: Stopped ServerConnector@727320fa{HTTP/1.1,[http/1.1]}{0.0.0.0:16010}
2021-11-30 13:42:27,287 INFO  [master/DCHDPD03:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@37c41ec0{/static,file:///usr/hdp/3.1.5.0-152/hbase/hbase-webapps/static/,UNAVAILABLE}
2021-11-30 13:42:27,287 INFO  [master/DCHDPD03:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@77c233af{/logs,file:///var/log/hbase/,UNAVAILABLE}
2021-11-30 13:42:27,288 INFO  [master/DCHDPD03:16000] regionserver.HRegionServer: aborting server dchdpd03.dcdms,16000,1638254541977
2021-11-30 13:42:27,288 INFO  [master/DCHDPD03:16000] regionserver.HRegionServer: stopping server dchdpd03.dcdms,16000,1638254541977; all regions closed.
2021-11-30 13:42:27,288 INFO  [master/DCHDPD03:16000] hbase.ChoreService: Chore service for: master/DCHDPD03:16000 had [] on shutdown
2021-11-30 13:42:27,289 WARN  [master/DCHDPD03:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2021-11-30 13:42:27,291 INFO  [master/DCHDPD03:16000] zookeeper.ZooKeeper: Session: 0x37d5ab0a503000e closed
2021-11-30 13:42:27,291 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2021-11-30 13:42:27,291 INFO  [master/DCHDPD03:16000] regionserver.HRegionServer: Exiting; stopping=dchdpd03.dcdms,16000,1638254541977; zookeeper connection closed.
2021-11-30 13:42:27,291 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
	at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:244)
	at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
	at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3109)

 


The hbase.version file is missing and we try recover it using  hbck2  but it failed and it thrown some error below

 

14:14:10.011 [main] WARN  org.apache.hadoop.hbase.client.ConnectionImplementation - Retrieve cluster id failed
java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNode                              Exception: KeeperErrorCode = NoNode for /hbase-unsecure/hbaseid
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_112]
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) ~[?:1.8.0_112]
        at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java: 549) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:287) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [?:1.8.0_112]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) [?:1.8.0_112]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [?:1.8.0_112]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) [?:1.8.0_112]
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:220) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:115) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hbase.HBCK2.connect(HBCK2.java:839) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:932) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.hbase.HBCK2.run(HBCK2.java:830) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) [hadoop-common-3.1.1.3.1.5.0-152.jar:?]
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) [hadoop-common-3.1.1.3.1.5.0-152.jar:?]
        at org.apache.hbase.HBCK2.main(HBCK2.java:1145) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
Caused by: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/hbaseid
        at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[h                              base-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:177) ~[hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:342) ~[hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_112]
14:14:14.241 [main] INFO  org.apache.hadoop.hbase.client.RpcRetryingCallerImpl - Call exception, tries=6, retries=36, started=4139 ms ago, cancelled=false, msg=java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeep                              er.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=, see https://s.apache.org/timeout
...
...
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Tue Nov 30 14:14:10 WIB 2021, RpcRetryingCaller{globalStartTime=1638256450102, pause=100, maxAttempts=36}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper                              .KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
Tue Nov 30 14:14:10 WIB 2021, RpcRetryingCaller{globalStartTime=1638256450102, pause=100, maxAttempts=36}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper                              .KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
...
...
14:23:00.842 [ReadOnlyZKClient-dchdpm01.dcdms:2181,dchdpm02.dcdms:2181,dchdpm03.dcdms:2181@0x5c7933ad] INFO  org.ap                              ache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Session: 0x37d5ab0a5030012 closed
        at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
14:23:00.842 [ReadOnlyZKClient-dchdpm01.dcdms:2181,dchdpm02.dcdms:2181,dchdpm03.dcdms:2181@0x5c7933ad-EventThread]                               INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - EventThread shut down
        at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3089)
        at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3081)
        at org.apache.hadoop.hbase.client.HBaseAdmin.getClusterMetrics(HBaseAdmin.java:2117)
        at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:149)
        at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:933)
        at org.apache.hbase.HBCK2.run(HBCK2.java:830)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hbase.HBCK2.main(HBCK2.java:1145)
Caused by: org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
        at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1175)
        at org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1234)
        at org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1223)
        at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57)
        at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
        ... 9 more
Caused by: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException                              : KeeperErrorCode = NoNode for /hbase-unsecure/master
        at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2012)
        at org.apache.hadoop.hbase.client.ConnectionImplementation.access$500(ConnectionImplementation.java:138)
        at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStubNoRetries(ConnectionImplementation.java:1136)
        at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1169)
        ... 13 more
Caused by: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
        at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
        at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:177)
        at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:342)
        at java.lang.Thread.run(Thread.java:745)

 

 

It seems there are no node for /hbase-unsecure/hbaseid and /hbase-unsecure/master
When we search the trouble for  /hbase-unsecure/hbaseid & /hbase-unsecure/master, it said the hbase master need to be activated  but in real we still cannot activate the hbase master

Is there any step we missed or any troubleshoot?

Regards

1 REPLY 1

avatar
Cloudera Employee

First of validate in zookeeper if there are entries for the hbase id. 

 

There is another easy way to wipe the slate clean 

 

bin/hbase clean

 

Select the options -cleanAll which will delete HDFS data and also the zookeeper data. 

This should clean the things and get the things going. 

 

** Make sure to stop the Hbase service when you are doing this. 

OR 

You can use -cleanZk option to delete only the zookeeper data and re populate the same. Steps remain the same, bring down the Hbase service and run these commands from admin/master nodes. 

**These actions can't be reverted.