Created 03-18-2021 10:10 AM
I have a situation where my namespace system table is not online and because of that I’m seeing these messages in HBase master log:
2021-03-17 20:29:54,614 WARN [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
I came across this article for fixing this problem:
https://docs.cloudera.com/runtime/7.2.7/troubleshooting-hbase/topics/hbase_running_hbck2.html
But while following the article and running suggested command, running into following problem: getting “Failed to specify server's Kerberos principal name” error. I need clarification on following two points:
==========================================
[root@itk-phx-prod-edge-1 ~]# kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase
[root@itk-phx-prod-edge-1 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hbase@PROD.DATALAKE.PHX
Valid starting Expires Service principal
03/18/2021 16:45:53 03/19/2021 16:45:53 krbtgt/PROD.DATALAKE.PHX@PROD.DATALAKE.PHX
===========================================
[root@itk-phx-prod-edge-1 target]# hbase hbck -j hbase-hbck2-1.2.0-SNAPSHOT.jar -s assigns hbase:namespace 1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hbase-hbck2/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.2.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/phoenix/phoenix-5.0.0.3.1.0.0-78-server.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
16:47:07.894 [main] INFO org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Connect 0x560348e6 to itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181 with session timeout=90000ms, retries 6, retry interval 1000ms, keepAlive=60000ms
16:47:07.962 [ReadOnlyZKClient-itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181@0x560348e6-SendThread(itk-phx-prod-zk-2.datalake.phx:2181)] WARN org.apache.zookeeper.ClientCnxn - SASL configuration failed: javax.security.auth.login.LoginException: Zookeeper client cannot authenticate using the Client section of the supplied JAAS configuration: '/usr/hdp/current/hbase-client/conf/hbase_regionserver_jaas.conf' because of a RuntimeException: java.lang.SecurityException: java.io.IOException: /usr/hdp/current/hbase-client/conf/hbase_regionserver_jaas.conf (No such file or directory) Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
16:47:08.253 [main] INFO org.apache.hbase.HBCK2 - Skipped assigns command version check; 'skip' set
16:47:08.838 [main] INFO org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Close zookeeper connection 0x560348e6 to itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181
Exception in thread "main" java.io.IOException: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:111)
at org.apache.hbase.HBCK2.assigns(HBCK2.java:308)
at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:819)
at org.apache.hbase.HBCK2.run(HBCK2.java:777)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hbase.HBCK2.main(HBCK2.java:1067)
Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:336)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:95)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.assigns(MasterProtos.java)
at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:106)
... 6 more
Caused by: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:185)
I can attach the complete Hbase master log as well if that helps.
Created 05-26-2021 11:26 PM
Hello @Priyanka26
As we haven't heard from your side, We shall summarise the Discussion in the Post to ensure the same benefits Users with similar experiences:
PROBLEM: In HDP v3.1.0, HBase NameSpace Region isn't assigned, thereby causing the following Message:
2021-03-17 20:29:54,614 WARN [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
Your Team tried to use HBCK2 Assign yet the same fails with the following Error:
Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
DISCUSSION SUMMARY: (I) In Customer's HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster.
(II) The referred JAR aren't available for download publicly. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards.
(III) Your Team decided to use the Bulk-Load approach to ensure HBase is Initialised afresh. [1] shares the Steps used by your Team.
In short, Do Upgrade to HDP v3.1.5 (The same would be a Maintenance Upgrade from v3.1.0 to v3.1.5) as soon as possible. Until then, Such issues require Bulk-Loading. The Bug causing the HBCK2 issue in a Kerberized Environment impacts HDP v3.0.0 through (And inclusive) HDP v3.1.4 & Fixed in HDP v3.1.5.
Thanks again for using Cloudera Community.
- Smarak
Created 03-18-2021 10:39 PM
Hello @Priyanka26
Thanks for using Cloudera Community. Based on the Post, Your team have Namespace Region "0c72d4be7e562a2ec8a86c3ec830bdc5" causing the Master StartUp initialization. Using HBCK2 is throwing a Kerberos Exception.
In HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster.
- Smarak
Created on 03-19-2021 07:45 AM - edited 03-19-2021 07:47 AM
@smdas Thank you for your response! It is a Production Cluster and that's why we don't want to re-initialize Hbase. Is there any other way to recover from this? Also, is the modified Hbase client and Hbase-server jar available for download?
Created 03-19-2021 08:13 AM
Hello @Priyanka26
Thanks for the Update. The referred JAR aren't available for download. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards.
If I find anything, I shall let you know. Yet, It's highly unlikely to come across any easier Solution.
- Smarak
Created 03-19-2021 01:35 PM
@smdas Appreciate your response! One last thing, do these steps look correct to you for the recovery process:
Thank you so much for all the help!
Created 03-20-2021 01:30 AM
Hello @Priyanka26
Thanks for the Update. I haven't tried these Steps yet they look fine on Papers. As you are taking the BackUp of the Data Directory, We would have the HFiles for any concerns as well.
Do let us know how things goes & most importantly, Do Plan to Upgrade to HDP v3.1.5.
- Smarak
Created on 03-23-2021 11:30 AM - edited 03-23-2021 11:36 AM
@smdas Hi! So clearing out root directory for Hbase in zookeeper and using the backup of just "/apps/hbase/data/data" directory is not going to pop up the required Namespace and tables. So for data recovery using HFiles, following steps need to be done, correct?
1. Start Hbase on a new data directory.
2. Create the required Namespace and table structure in Hbase, then copy the HFiles from backup for all tables in respective location.
3. So just copying the HFiles is enough or after this do we need to run "completebulkoad" utility for all tables from the copied HFiles?
Problem: I suspect with this approach we still would require offline meta repair " hbase hbck -repair" which is not available with the HDP version we have.
Please let me know your thoughts.
Created 03-25-2021 11:55 PM
Hi @Priyanka26
Thanks for the Update. In the 2nd Step, Your Team mentioned creating the required NameSpace & Tables. Yet, I would suggest Bulk-Loading i.e. CompleteBulkLoad Process as simply copying the Hfiles won't likely work. Additionally, the existing Hfiles would be part of Split/Compaction & ideally, I expect your Team would create Tables with 1 Region. As such, BulkLoad would gracefully handle such situations.
For Customer facing issues like yours in earlier HBase v2.x HDP Release, We typically use BulkLoad. Yet, Pointing to the fact that your Team should upgrade to HDP v3.1.5 at minimum to avoid this issue in future.
- Smarak
Created 05-02-2021 12:02 AM
Hello @Priyanka26
We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well.
Thanks, Smarak
Created 05-21-2021 03:51 AM
Hello @Priyanka26
We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well.
Thanks, Smarak