Support Questions

Find answers, ask questions, and share your expertise

Hbase namespace table in not online

avatar
Explorer

I have a situation where my namespace system table is not online and because of that I’m seeing these messages in HBase master log:

 

2021-03-17 20:29:54,614 WARN  [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

 

I came across this article for fixing this problem:

 

https://docs.cloudera.com/runtime/7.2.7/troubleshooting-hbase/topics/hbase_running_hbck2.html

 

But while following the article and running suggested command, running into following problem: getting “Failed to specify server's Kerberos principal name” error. I need clarification on following two points:

 

  1. Do we need any specific format to run hbck2 utility if the cluster is kerberized? I.e if the principal needs to be passed as an external parameter? I even tried passing hbase configurations with --config option which wasn't an acceptable option.
  2. Has anyone else faces similar issue with Hbase system table and fixed it using a different approach?

==========================================

 

[root@itk-phx-prod-edge-1 ~]# kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase

[root@itk-phx-prod-edge-1 ~]# klist

Ticket cache: FILE:/tmp/krb5cc_0

Default principal: hbase@PROD.DATALAKE.PHX

 

Valid starting       Expires              Service principal

03/18/2021 16:45:53  03/19/2021 16:45:53  krbtgt/PROD.DATALAKE.PHX@PROD.DATALAKE.PHX

 

===========================================

 

 

[root@itk-phx-prod-edge-1 target]# hbase hbck -j hbase-hbck2-1.2.0-SNAPSHOT.jar -s assigns hbase:namespace 1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/root/hbase-hbck2/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.2.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/phoenix/phoenix-5.0.0.3.1.0.0-78-server.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

16:47:07.894 [main] INFO  org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Connect 0x560348e6 to itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181 with session timeout=90000ms, retries 6, retry interval 1000ms, keepAlive=60000ms

16:47:07.962 [ReadOnlyZKClient-itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181@0x560348e6-SendThread(itk-phx-prod-zk-2.datalake.phx:2181)] WARN  org.apache.zookeeper.ClientCnxn - SASL configuration failed: javax.security.auth.login.LoginException: Zookeeper client cannot authenticate using the Client section of the supplied JAAS configuration: '/usr/hdp/current/hbase-client/conf/hbase_regionserver_jaas.conf' because of a RuntimeException: java.lang.SecurityException: java.io.IOException: /usr/hdp/current/hbase-client/conf/hbase_regionserver_jaas.conf (No such file or directory) Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.

16:47:08.253 [main] INFO  org.apache.hbase.HBCK2 - Skipped assigns command version check; 'skip' set

16:47:08.838 [main] INFO  org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Close zookeeper connection 0x560348e6 to itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181

Exception in thread "main" java.io.IOException: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name

at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:111)

at org.apache.hbase.HBCK2.assigns(HBCK2.java:308)

at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:819)

at org.apache.hbase.HBCK2.run(HBCK2.java:777)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)

at org.apache.hbase.HBCK2.main(HBCK2.java:1067)

Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name

at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:336)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:95)

at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)

at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.assigns(MasterProtos.java)

at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:106)

... 6 more

Caused by: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name

at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:185)

 

I can attach the complete Hbase master log as well if that helps.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @Priyanka26 

 

As we haven't heard from your side, We shall summarise the Discussion in the Post to ensure the same benefits Users with similar experiences:

 

PROBLEM: In HDP v3.1.0, HBase NameSpace Region isn't assigned, thereby causing the following Message:

2021-03-17 20:29:54,614 WARN  [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

 Your Team tried to use HBCK2 Assign yet the same fails with the following Error:

Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name

 

DISCUSSION SUMMARY: (I) In Customer's HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster. 

(II) The referred JAR aren't available for download publicly. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards. 

(III) Your Team decided to use the Bulk-Load approach to ensure HBase is Initialised afresh. [1] shares the Steps used by your Team. 

 

In short, Do Upgrade to HDP v3.1.5 (The same would be a Maintenance Upgrade from v3.1.0 to v3.1.5) as soon as possible. Until then, Such issues require Bulk-Loading. The Bug causing the HBCK2 issue in a Kerberized Environment impacts HDP v3.0.0 through (And inclusive) HDP v3.1.4 & Fixed in HDP v3.1.5. 

 

Thanks again for using Cloudera Community.

 

- Smarak

 

[1] https://community.cloudera.com/t5/Support-Questions/Hbase-namespace-table-in-not-online/m-p/313460/h...

View solution in original post

10 REPLIES 10

avatar
Super Collaborator

Hello @Priyanka26 

 

Thanks for using Cloudera Community. Based on the Post, Your team have Namespace Region "0c72d4be7e562a2ec8a86c3ec830bdc5" causing the Master StartUp initialization. Using HBCK2 is throwing a Kerberos Exception.

 

In HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster.

 

- Smarak

avatar
Explorer

@smdas Thank you for your response! It is a Production Cluster and that's why we don't want to re-initialize Hbase. Is there any other way to recover from this? Also, is the modified Hbase client and Hbase-server jar available for download?

avatar
Super Collaborator

Hello @Priyanka26 

 

Thanks for the Update. The referred JAR aren't available for download. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards. 

 

If I find anything, I shall let you know. Yet, It's highly unlikely to come across any easier Solution.

 

- Smarak

avatar
Explorer

@smdas Appreciate your response! One last thing, do these steps look correct to you for the recovery process:

 

  1. Take backup of HBase data directory residing in HDFS -  “/apps/hbase/data”
  2. Stop the Hbase service.
  3. Connect with zookeeper client and delete the hbase root directory : hbase zkcli  delete /hbase-secure
  4. Start the Hbase service
  5. Once the service is online, stop just the HBase masters.
  6. Copy the “/apps/hbase/data/data” from the backup to current “/apps/hbase/data/data” HDFS location.
  7. Start the Hbase Masters.
  8. Verify if all the namespaces and tables are present that existed earlier.

Thank you so much for all the help!

avatar
Super Collaborator

Hello @Priyanka26 

 

Thanks for the Update. I haven't tried these Steps yet they look fine on Papers. As you are taking the BackUp of the Data Directory, We would have the HFiles for any concerns as well. 

 

Do let us know how things goes & most importantly, Do Plan to Upgrade to HDP v3.1.5.

 

- Smarak

avatar
Explorer

@smdas Hi! So clearing out root directory for Hbase in zookeeper and using the backup of just  "/apps/hbase/data/data" directory is not going to pop up the required Namespace and tables. So for data recovery using HFiles, following steps need to be done, correct?

 

1. Start Hbase on a new data directory.

2. Create the required Namespace and table structure in Hbase, then copy the HFiles from backup for all tables in respective location.

3. So just copying the HFiles is enough or after this do we need to run "completebulkoad" utility for all tables from the copied HFiles?

 

Problem: I suspect with this approach we still would require offline meta repair " hbase hbck -repair" which is not available with the HDP version we have.

 

Please let me know your thoughts.

 

 

avatar
Super Collaborator

Hi @Priyanka26 

 

Thanks for the Update. In the 2nd Step, Your Team mentioned creating the required NameSpace & Tables. Yet, I would suggest Bulk-Loading i.e. CompleteBulkLoad Process as simply copying the Hfiles won't likely work. Additionally, the existing Hfiles would be part of Split/Compaction & ideally, I expect your Team would create Tables with 1 Region. As such, BulkLoad would gracefully handle such situations.

 

For Customer facing issues like yours in earlier HBase v2.x HDP Release, We typically use BulkLoad. Yet, Pointing to the fact that your Team should upgrade to HDP v3.1.5 at minimum to avoid this issue in future. 

 

- Smarak

avatar
Super Collaborator

Hello @Priyanka26 

 

We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well. 

 

Thanks, Smarak

avatar
Super Collaborator

Hello @Priyanka26 

 

We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well. 

 

Thanks, Smarak