Support Questions

Find answers, ask questions, and share your expertise

Production master not coming up

avatar
Explorer

Hi Team, 

 

Prod master node is not coming up. Getting below error, could you pls tell me how to resolve the issue as the data is very important. 

 

2023-01-23 09:40:50,748 ERROR [master/ctrlsu-hbaseRS1:16000:becomeActiveMaster] master.HMaster: Failed to become active master

java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED

at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:379)

at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:319)

at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1324)

at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1055)

at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2184)

at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:519)

at java.base/java.lang.Thread.run(Thread.java:829)

Caused by: java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned and enabled: tableName=hbase:namespace, state=ENABLED

at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:107)

at org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:63)

at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:249)

at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1322)

... 4 more

2023-01-23 09:40:50,749 ERROR [master/ctrlsu-hbaseRS1:16000:becomeActiveMaster] master.HMaster: Master server abort: loaded coprocessors are: []

2023-01-23 09:40:50,749 ERROR [master/ctrlsu-hbaseRS1:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master ctrlsu-hbasers1,16000,1674446742141: Unhandled exception. Starting shutdown. *****

java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED

at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:379)

at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:319)

at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1324)

at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1055)

at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2184)

at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:519)

at java.base/java.lang.Thread.run(Thread.java:829)

Caused by: java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned and enabled: tableName=hbase:namespace, state=ENABLED

at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:107)

at org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:63)

at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:249)

at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1322)

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @RammiSE 

 

Your Post is being replied a bit late, yet I am posting a response anyways. Assuming your Team has resolved the Issue, Appreciate your Team sharing the details in the Post for wider audience. 

 

For HMaster to be Initialised, "hbase:meta" & "hbase:namespace" Table Region needs to be Online. In your previous thread, the HMaster is reporting "hbase:meta" isn't Online [1]. As such, Use the HBCK2 JAR to assign the "hbase:meta" Region "1588230740" first & review (Via HBase UI) whether Regions are being assigned successfully. It's feasible the "hbase:namespace" Table Region would also reporting similar tracing, in which case your Team needs to use HBCK2 JAR to assign the "hbase:namespace" Region. Restarting HMaster after manually performing HBCK2 Assign isn't required always, yet the same won't harm as well. 

 

Regards, Smarak

 

[1] 

2023-01-23 16:05:34,990 WARN  [master/ctrlsu-hbaseMS:16000:becomeActiveMaster] master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1674468867063, server=hadoop-datanode2,16020,1674362337687}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined

 

View solution in original post

8 REPLIES 8

avatar
Super Collaborator

Hi @RammiSE , Based on the exception, the hbase:namespace table is not online. You will need to assign the namespace region to bring up the Hbase Master.

 

https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_hbase_hbck.html

 

~~~

Caused by: java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned and enabled: tableName=hbase:namespace, state=ENABLED

avatar
Explorer

@rki_ Getting this error after assigns 

 

2023-01-23 16:04:18,310 INFO  [hconnection-0x6be3e1e2-shared-pool-1] client.RpcRetryingCallerImpl: Server.getRegionByEncodedName(HRegionServer.java:3462)

at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3439)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1488)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3182)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3558)

at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45819)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)

, details=row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hadoop-datanode2,16020,1674362337687, seqNum=-1, see https://s.apache.org/timeout

avatar
Super Collaborator

@RammiSE you will need to assign the respective namespace region ID by checking the Hbase Master log using the hbck2 jar

avatar
Explorer

@rki_ Getting this error after executing the command "hbase hbck -j .jar assigns f0b4865fe8ea07321ed8eb237a592c10" 

 

2023-01-23 16:04:38,448 INFO  [hconnection-0x6be3e1e2-shared-pool-1] client.RpcRetryingCallerImpl: Server.getRegionByEncodedName(HRegionServer.java:3462)

at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3439)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1488)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3182)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3558)

at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45819)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)

, details=row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hadoop-datanode2,16020,1674362337687, seqNum=-1, see https://s.apache.org/timeout

2023-01-23 16:05:34,990 WARN  [master/ctrlsu-hbaseMS:16000:becomeActiveMaster] master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1674468867063, server=hadoop-datanode2,16020,1674362337687}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.

avatar
Super Collaborator

@RammiSE Try the below :

 

./hbase hbck -j /tmp/hbase-hbck2-1.2.0.jar assigns -o f0b4865fe8ea07321ed8eb237a592c10

avatar
Explorer

@rki_ I am executing this command

"hbase hbck -j /tmp/hbase-hbck2-1.2.0.jar assigns f0b4865fe8ea07321ed8eb237a592c"

and getting error . pls guide me the next steps

 

Exception in thread "main" java.io.IOException: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownRegionException): org.apache.hadoop.hbase.UnknownRegionException: Error trying to load region f0b4865fe8ea07321ed8eb237a592c10 from META

at org.apache.hadoop.hbase.master.assignment.AssignmentManager.loadRegionFromMeta(AssignmentManager.java:1646)

at org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:2581)

at org.apache.hadoop.hbase.master.MasterRpcServices.assigns(MasterRpcServices.java:2615)

at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$2.callBlockingMethod(MasterProtos.java)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)

Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=46, exceptions:

2023-01-23T10:34:38.453Z, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68451: org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online on hadoop-datanode2,16020,1674468863385

at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3462)

at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3439)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1488)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3182)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3558)

at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45819)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)

row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hadoop-datanode2,16020,1674362337687, seqNum=-1

 

avatar
Explorer

@rki_ Getting this error after executing the command "hbase hbck -j jar assigns f0b4865fe8ea07321ed8eb237a592c" 

 

2023-01-23 16:04:38,448 INFO  [hconnection-0x6be3e1e2-shared-pool-1] client.RpcRetryingCallerImpl: Server.getRegionByEncodedName(HRegionServer.java:3462)

at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3439)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1488)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3182)

at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3558)

at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45819)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)

, details=row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hadoop-datanode2,16020,1674362337687, seqNum=-1, see https://s.apache.org/timeout

2023-01-23 16:05:34,990 WARN  [master/ctrlsu-hbaseMS:16000:becomeActiveMaster] master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1674468867063, server=hadoop-datanode2,16020,1674362337687}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.

avatar
Super Collaborator

Hello @RammiSE 

 

Your Post is being replied a bit late, yet I am posting a response anyways. Assuming your Team has resolved the Issue, Appreciate your Team sharing the details in the Post for wider audience. 

 

For HMaster to be Initialised, "hbase:meta" & "hbase:namespace" Table Region needs to be Online. In your previous thread, the HMaster is reporting "hbase:meta" isn't Online [1]. As such, Use the HBCK2 JAR to assign the "hbase:meta" Region "1588230740" first & review (Via HBase UI) whether Regions are being assigned successfully. It's feasible the "hbase:namespace" Table Region would also reporting similar tracing, in which case your Team needs to use HBCK2 JAR to assign the "hbase:namespace" Region. Restarting HMaster after manually performing HBCK2 Assign isn't required always, yet the same won't harm as well. 

 

Regards, Smarak

 

[1] 

2023-01-23 16:05:34,990 WARN  [master/ctrlsu-hbaseMS:16000:becomeActiveMaster] master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1674468867063, server=hadoop-datanode2,16020,1674362337687}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined