Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Cannot restore hbase.id and hbase.version

avatar
Explorer

Recently we have deploy a new hdp cluster into 3 master and 5 worker nodes.
But it looks like there are several component that mis-placed such as Hbase service.
A Hbase master has been installed into a worker node and 3 Hbase Region Servers is located on 3 master nodes.

So, we've a plan to relocate the some mis-placed component of hbase (the hbase master must be in master node and the region servers mus be in worker nodes).
Our first step is to  decomissioned the 3 hbase region server on masternode but it cannot be done because the hbase master failed to be started

Then we investigate the hbase master log and we found

 

2021-11-30 13:42:24,379 ERROR [master/DCHDPD03:16000:becomeActiveMaster] master.HMaster: Failed to become active master
org.apache.hadoop.hbase.util.FileSystemVersionException: hbase.version file is missing. Is your hbase.rootdir valid? You can restore hbase.version file by running 'HBCK2 filesystem -fix'. See https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
	at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:452)
	at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:275)
	at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153)
	at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:124)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:865)
	at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2267)
	at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:586)
	at java.lang.Thread.run(Thread.java:745)
2021-11-30 13:42:24,380 ERROR [master/DCHDPD03:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master dchdpd03.dcdms,16000,1638254541977: Unhandled exception. Starting shutdown. *****
org.apache.hadoop.hbase.util.FileSystemVersionException: hbase.version file is missing. Is your hbase.rootdir valid? You can restore hbase.version file by running 'HBCK2 filesystem -fix'. See https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
	at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:452)
	at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:275)
	at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153)
	at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:124)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:865)
	at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2267)
	at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:586)
	at java.lang.Thread.run(Thread.java:745)
2021-11-30 13:42:24,380 INFO  [master/DCHDPD03:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'dchdpd03.dcdms,16000,1638254541977' *****
2021-11-30 13:42:24,380 INFO  [master/DCHDPD03:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/DCHDPD03:16000:becomeActiveMaster
2021-11-30 13:42:27,268 INFO  [master/DCHDPD03:16000] ipc.NettyRpcServer: Stopping server on /10.0.45.16:16000
2021-11-30 13:42:27,275 INFO  [master/DCHDPD03:16000] regionserver.HRegionServer: Stopping infoServer
2021-11-30 13:42:27,282 INFO  [master/DCHDPD03:16000] handler.ContextHandler: Stopped o.e.j.w.WebAppContext@47ac613b{/,null,UNAVAILABLE}{file:/usr/hdp/3.1.5.0-152/hbase/hbase-webapps/master}
2021-11-30 13:42:27,287 INFO  [master/DCHDPD03:16000] server.AbstractConnector: Stopped ServerConnector@727320fa{HTTP/1.1,[http/1.1]}{0.0.0.0:16010}
2021-11-30 13:42:27,287 INFO  [master/DCHDPD03:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@37c41ec0{/static,file:///usr/hdp/3.1.5.0-152/hbase/hbase-webapps/static/,UNAVAILABLE}
2021-11-30 13:42:27,287 INFO  [master/DCHDPD03:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@77c233af{/logs,file:///var/log/hbase/,UNAVAILABLE}
2021-11-30 13:42:27,288 INFO  [master/DCHDPD03:16000] regionserver.HRegionServer: aborting server dchdpd03.dcdms,16000,1638254541977
2021-11-30 13:42:27,288 INFO  [master/DCHDPD03:16000] regionserver.HRegionServer: stopping server dchdpd03.dcdms,16000,1638254541977; all regions closed.
2021-11-30 13:42:27,288 INFO  [master/DCHDPD03:16000] hbase.ChoreService: Chore service for: master/DCHDPD03:16000 had [] on shutdown
2021-11-30 13:42:27,289 WARN  [master/DCHDPD03:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2021-11-30 13:42:27,291 INFO  [master/DCHDPD03:16000] zookeeper.ZooKeeper: Session: 0x37d5ab0a503000e closed
2021-11-30 13:42:27,291 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2021-11-30 13:42:27,291 INFO  [master/DCHDPD03:16000] regionserver.HRegionServer: Exiting; stopping=dchdpd03.dcdms,16000,1638254541977; zookeeper connection closed.
2021-11-30 13:42:27,291 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
	at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:244)
	at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
	at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3109)

 


The hbase.version file is missing and we try recover it using  hbck2  but it failed and it thrown some error below

 

14:14:10.011 [main] WARN  org.apache.hadoop.hbase.client.ConnectionImplementation - Retrieve cluster id failed
java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNode                              Exception: KeeperErrorCode = NoNode for /hbase-unsecure/hbaseid
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_112]
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) ~[?:1.8.0_112]
        at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java: 549) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:287) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [?:1.8.0_112]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) [?:1.8.0_112]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [?:1.8.0_112]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) [?:1.8.0_112]
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:220) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:115) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hbase.HBCK2.connect(HBCK2.java:839) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:932) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.hbase.HBCK2.run(HBCK2.java:830) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) [hadoop-common-3.1.1.3.1.5.0-152.jar:?]
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) [hadoop-common-3.1.1.3.1.5.0-152.jar:?]
        at org.apache.hbase.HBCK2.main(HBCK2.java:1145) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
Caused by: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/hbaseid
        at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[h                              base-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:177) ~[hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:342) ~[hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
        at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_112]
14:14:14.241 [main] INFO  org.apache.hadoop.hbase.client.RpcRetryingCallerImpl - Call exception, tries=6, retries=36, started=4139 ms ago, cancelled=false, msg=java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeep                              er.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=, see https://s.apache.org/timeout
...
...
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Tue Nov 30 14:14:10 WIB 2021, RpcRetryingCaller{globalStartTime=1638256450102, pause=100, maxAttempts=36}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper                              .KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
Tue Nov 30 14:14:10 WIB 2021, RpcRetryingCaller{globalStartTime=1638256450102, pause=100, maxAttempts=36}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper                              .KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
...
...
14:23:00.842 [ReadOnlyZKClient-dchdpm01.dcdms:2181,dchdpm02.dcdms:2181,dchdpm03.dcdms:2181@0x5c7933ad] INFO  org.ap                              ache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Session: 0x37d5ab0a5030012 closed
        at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
14:23:00.842 [ReadOnlyZKClient-dchdpm01.dcdms:2181,dchdpm02.dcdms:2181,dchdpm03.dcdms:2181@0x5c7933ad-EventThread]                               INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - EventThread shut down
        at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3089)
        at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3081)
        at org.apache.hadoop.hbase.client.HBaseAdmin.getClusterMetrics(HBaseAdmin.java:2117)
        at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:149)
        at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:933)
        at org.apache.hbase.HBCK2.run(HBCK2.java:830)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hbase.HBCK2.main(HBCK2.java:1145)
Caused by: org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
        at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1175)
        at org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1234)
        at org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1223)
        at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57)
        at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
        ... 9 more
Caused by: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException                              : KeeperErrorCode = NoNode for /hbase-unsecure/master
        at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2012)
        at org.apache.hadoop.hbase.client.ConnectionImplementation.access$500(ConnectionImplementation.java:138)
        at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStubNoRetries(ConnectionImplementation.java:1136)
        at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1169)
        ... 13 more
Caused by: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
        at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
        at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:177)
        at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:342)
        at java.lang.Thread.run(Thread.java:745)

 

 

It seems there are no node for /hbase-unsecure/hbaseid and /hbase-unsecure/master
When we search the trouble for  /hbase-unsecure/hbaseid & /hbase-unsecure/master, it said the hbase master need to be activated  but in real we still cannot activate the hbase master

Is there any step we missed or any troubleshoot?

Regards

1 REPLY 1

avatar
Cloudera Employee

First of validate in zookeeper if there are entries for the hbase id. 

 

There is another easy way to wipe the slate clean 

 

bin/hbase clean

 

Select the options -cleanAll which will delete HDFS data and also the zookeeper data. 

This should clean the things and get the things going. 

 

** Make sure to stop the Hbase service when you are doing this. 

OR 

You can use -cleanZk option to delete only the zookeeper data and re populate the same. Steps remain the same, bring down the Hbase service and run these commands from admin/master nodes. 

**These actions can't be reverted.