Support Questions

Find answers, ask questions, and share your expertise

Hbase namespace table in not online

avatar
Explorer

I have a situation where my namespace system table is not online and because of that I’m seeing these messages in HBase master log:

 

2021-03-17 20:29:54,614 WARN  [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

 

I came across this article for fixing this problem:

 

https://docs.cloudera.com/runtime/7.2.7/troubleshooting-hbase/topics/hbase_running_hbck2.html

 

But while following the article and running suggested command, running into following problem: getting “Failed to specify server's Kerberos principal name” error. I need clarification on following two points:

 

  1. Do we need any specific format to run hbck2 utility if the cluster is kerberized? I.e if the principal needs to be passed as an external parameter? I even tried passing hbase configurations with --config option which wasn't an acceptable option.
  2. Has anyone else faces similar issue with Hbase system table and fixed it using a different approach?

==========================================

 

[root@itk-phx-prod-edge-1 ~]# kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase

[root@itk-phx-prod-edge-1 ~]# klist

Ticket cache: FILE:/tmp/krb5cc_0

Default principal: hbase@PROD.DATALAKE.PHX

 

Valid starting       Expires              Service principal

03/18/2021 16:45:53  03/19/2021 16:45:53  krbtgt/PROD.DATALAKE.PHX@PROD.DATALAKE.PHX

 

===========================================

 

 

[root@itk-phx-prod-edge-1 target]# hbase hbck -j hbase-hbck2-1.2.0-SNAPSHOT.jar -s assigns hbase:namespace 1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/root/hbase-hbck2/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.2.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/phoenix/phoenix-5.0.0.3.1.0.0-78-server.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

16:47:07.894 [main] INFO  org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Connect 0x560348e6 to itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181 with session timeout=90000ms, retries 6, retry interval 1000ms, keepAlive=60000ms

16:47:07.962 [ReadOnlyZKClient-itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181@0x560348e6-SendThread(itk-phx-prod-zk-2.datalake.phx:2181)] WARN  org.apache.zookeeper.ClientCnxn - SASL configuration failed: javax.security.auth.login.LoginException: Zookeeper client cannot authenticate using the Client section of the supplied JAAS configuration: '/usr/hdp/current/hbase-client/conf/hbase_regionserver_jaas.conf' because of a RuntimeException: java.lang.SecurityException: java.io.IOException: /usr/hdp/current/hbase-client/conf/hbase_regionserver_jaas.conf (No such file or directory) Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.

16:47:08.253 [main] INFO  org.apache.hbase.HBCK2 - Skipped assigns command version check; 'skip' set

16:47:08.838 [main] INFO  org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Close zookeeper connection 0x560348e6 to itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181

Exception in thread "main" java.io.IOException: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name

at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:111)

at org.apache.hbase.HBCK2.assigns(HBCK2.java:308)

at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:819)

at org.apache.hbase.HBCK2.run(HBCK2.java:777)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)

at org.apache.hbase.HBCK2.main(HBCK2.java:1067)

Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name

at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:336)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:95)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)

at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.assigns(MasterProtos.java)

at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:106)

... 6 more

Caused by: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name

at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:185)

 

I can attach the complete Hbase master log as well if that helps.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @Priyanka26 

 

As we haven't heard from your side, We shall summarise the Discussion in the Post to ensure the same benefits Users with similar experiences:

 

PROBLEM: In HDP v3.1.0, HBase NameSpace Region isn't assigned, thereby causing the following Message:

2021-03-17 20:29:54,614 WARN  [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

 Your Team tried to use HBCK2 Assign yet the same fails with the following Error:

Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name

 

DISCUSSION SUMMARY: (I) In Customer's HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster. 

(II) The referred JAR aren't available for download publicly. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards. 

(III) Your Team decided to use the Bulk-Load approach to ensure HBase is Initialised afresh. [1] shares the Steps used by your Team. 

 

In short, Do Upgrade to HDP v3.1.5 (The same would be a Maintenance Upgrade from v3.1.0 to v3.1.5) as soon as possible. Until then, Such issues require Bulk-Loading. The Bug causing the HBCK2 issue in a Kerberized Environment impacts HDP v3.0.0 through (And inclusive) HDP v3.1.4 & Fixed in HDP v3.1.5. 

 

Thanks again for using Cloudera Community.

 

- Smarak

 

[1] https://community.cloudera.com/t5/Support-Questions/Hbase-namespace-table-in-not-online/m-p/313460/h...

View solution in original post

10 REPLIES 10

avatar
Super Collaborator

Hello @Priyanka26 

 

As we haven't heard from your side, We shall summarise the Discussion in the Post to ensure the same benefits Users with similar experiences:

 

PROBLEM: In HDP v3.1.0, HBase NameSpace Region isn't assigned, thereby causing the following Message:

2021-03-17 20:29:54,614 WARN  [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

 Your Team tried to use HBCK2 Assign yet the same fails with the following Error:

Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name

 

DISCUSSION SUMMARY: (I) In Customer's HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster. 

(II) The referred JAR aren't available for download publicly. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards. 

(III) Your Team decided to use the Bulk-Load approach to ensure HBase is Initialised afresh. [1] shares the Steps used by your Team. 

 

In short, Do Upgrade to HDP v3.1.5 (The same would be a Maintenance Upgrade from v3.1.0 to v3.1.5) as soon as possible. Until then, Such issues require Bulk-Loading. The Bug causing the HBCK2 issue in a Kerberized Environment impacts HDP v3.0.0 through (And inclusive) HDP v3.1.4 & Fixed in HDP v3.1.5. 

 

Thanks again for using Cloudera Community.

 

- Smarak

 

[1] https://community.cloudera.com/t5/Support-Questions/Hbase-namespace-table-in-not-online/m-p/313460/h...