Created 03-18-2021 10:10 AM
I have a situation where my namespace system table is not online and because of that I’m seeing these messages in HBase master log:
2021-03-17 20:29:54,614 WARN [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
I came across this article for fixing this problem:
https://docs.cloudera.com/runtime/7.2.7/troubleshooting-hbase/topics/hbase_running_hbck2.html
But while following the article and running suggested command, running into following problem: getting “Failed to specify server's Kerberos principal name” error. I need clarification on following two points:
==========================================
[root@itk-phx-prod-edge-1 ~]# kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase
[root@itk-phx-prod-edge-1 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hbase@PROD.DATALAKE.PHX
Valid starting Expires Service principal
03/18/2021 16:45:53 03/19/2021 16:45:53 krbtgt/PROD.DATALAKE.PHX@PROD.DATALAKE.PHX
===========================================
[root@itk-phx-prod-edge-1 target]# hbase hbck -j hbase-hbck2-1.2.0-SNAPSHOT.jar -s assigns hbase:namespace 1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hbase-hbck2/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.2.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/phoenix/phoenix-5.0.0.3.1.0.0-78-server.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
16:47:07.894 [main] INFO org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Connect 0x560348e6 to itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181 with session timeout=90000ms, retries 6, retry interval 1000ms, keepAlive=60000ms
16:47:07.962 [ReadOnlyZKClient-itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181@0x560348e6-SendThread(itk-phx-prod-zk-2.datalake.phx:2181)] WARN org.apache.zookeeper.ClientCnxn - SASL configuration failed: javax.security.auth.login.LoginException: Zookeeper client cannot authenticate using the Client section of the supplied JAAS configuration: '/usr/hdp/current/hbase-client/conf/hbase_regionserver_jaas.conf' because of a RuntimeException: java.lang.SecurityException: java.io.IOException: /usr/hdp/current/hbase-client/conf/hbase_regionserver_jaas.conf (No such file or directory) Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
16:47:08.253 [main] INFO org.apache.hbase.HBCK2 - Skipped assigns command version check; 'skip' set
16:47:08.838 [main] INFO org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient - Close zookeeper connection 0x560348e6 to itk-phx-prod-zk-1.datalake.phx:2181,itk-phx-prod-zk-2.datalake.phx:2181,itk-phx-prod-zk-3.datalake.phx:2181
Exception in thread "main" java.io.IOException: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:111)
at org.apache.hbase.HBCK2.assigns(HBCK2.java:308)
at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:819)
at org.apache.hbase.HBCK2.run(HBCK2.java:777)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hbase.HBCK2.main(HBCK2.java:1067)
Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:336)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:95)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.assigns(MasterProtos.java)
at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:106)
... 6 more
Caused by: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:185)
I can attach the complete Hbase master log as well if that helps.
Created 05-26-2021 11:26 PM
Hello @Priyanka26
As we haven't heard from your side, We shall summarise the Discussion in the Post to ensure the same benefits Users with similar experiences:
PROBLEM: In HDP v3.1.0, HBase NameSpace Region isn't assigned, thereby causing the following Message:
2021-03-17 20:29:54,614 WARN [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
Your Team tried to use HBCK2 Assign yet the same fails with the following Error:
Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
DISCUSSION SUMMARY: (I) In Customer's HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster.
(II) The referred JAR aren't available for download publicly. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards.
(III) Your Team decided to use the Bulk-Load approach to ensure HBase is Initialised afresh. [1] shares the Steps used by your Team.
In short, Do Upgrade to HDP v3.1.5 (The same would be a Maintenance Upgrade from v3.1.0 to v3.1.5) as soon as possible. Until then, Such issues require Bulk-Loading. The Bug causing the HBCK2 issue in a Kerberized Environment impacts HDP v3.0.0 through (And inclusive) HDP v3.1.4 & Fixed in HDP v3.1.5.
Thanks again for using Cloudera Community.
- Smarak
Created 05-26-2021 11:26 PM
Hello @Priyanka26
As we haven't heard from your side, We shall summarise the Discussion in the Post to ensure the same benefits Users with similar experiences:
PROBLEM: In HDP v3.1.0, HBase NameSpace Region isn't assigned, thereby causing the following Message:
2021-03-17 20:29:54,614 WARN [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
Your Team tried to use HBCK2 Assign yet the same fails with the following Error:
Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
DISCUSSION SUMMARY: (I) In Customer's HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster.
(II) The referred JAR aren't available for download publicly. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards.
(III) Your Team decided to use the Bulk-Load approach to ensure HBase is Initialised afresh. [1] shares the Steps used by your Team.
In short, Do Upgrade to HDP v3.1.5 (The same would be a Maintenance Upgrade from v3.1.0 to v3.1.5) as soon as possible. Until then, Such issues require Bulk-Loading. The Bug causing the HBCK2 issue in a Kerberized Environment impacts HDP v3.0.0 through (And inclusive) HDP v3.1.4 & Fixed in HDP v3.1.5.
Thanks again for using Cloudera Community.
- Smarak