Created 10-27-2025 11:25 PM
Hi everyone,
I’m encountering a critical issue with HBase 2.x: the RegionServer fails to connect to the Master, throwing a “GSS initiate failed” error.
Environment:
To troubleshoot this, I’ve performed the following checks and fixes—all verified as successful (:white_heavy_check_mark:):
Time Synchronization
Clock skew across cluster nodes is only 8 seconds, well within Kerberos tolerance (typically ≤ 5 minutes).
Hostname Resolution
Added explicit entries in /etc/hosts for both host117 and host121 to ensure bidirectional hostname resolution, eliminating potential Kerberos failures due to DNS issues.
Network Connectivity
Confirmed TCP connectivity to the Master’s RPC port using telnet host117 16000.
Kerberos Client Configuration (/etc/krb5.conf)
JAAS Configuration Fix
HBase Security Settings (hbase-site.xml)
Kerberos Ticket Acquisition & Validation
Ticket Cache Cleanup
Despite all the above checks passing, the issue persists.
Has anyone else encountered a similar “GSS initiate failed” error?
Any suggestions on what I might have missed or additional debugging steps would be greatly appreciated!
Created 11-05-2025 10:10 AM
@scala_ Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our HBase experts @shubham_sharma @smdas @pajoshi who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created on 11-06-2025 04:05 AM - edited 11-06-2025 04:05 AM
Hi @scala_
Could you please share the full error message along with the stack trace?
That will help us analyze the issue more accurately and guide you better.
Created 01-10-2026 10:40 PM
@scala_ FYI
➤ It appears you have performed an exhaustive verification of the standard Kerberos and HBase configurations. The "GSS initiate failed" error in a Kerberized HBase environment, especially when standard connectivity and ticket validation pass, often points to subtle mismatches in how the Java process handles the security handshake or how the underlying OS interacts with the Kerberos libraries.
➤ Based on the logs and environment details you provided, here are the most likely remaining causes for this issue:
1. Java Cryptography Extension (JCE) and Encryption Types
While you confirmed support for AES256 in krb5.conf, the Java Runtime Environment (JRE) itself may be restricting it.
-The Issue: Older versions of Java 8 require the JCE Unlimited Strength Jurisdiction Policy Files to be manually installed to handle 256-bit encryption. If the Master is sending an AES256 ticket but the RegionServer's JVM is restricted, the GSS initiation will fail.
-The Fix: Ensure the JCE policy files are installed, or if using a modern OpenJDK, ensure the java.security file allows all encryption strengths. You can also try restricting permitted_enctypes in krb5.conf to aes128-cts-hmac-sha1-96 temporarily to see if the connection succeeds with a lower bit-rate.
2. Reverse DNS (RDNS) Mismatch
Kerberos is extremely sensitive to how hostnames are resolved.
-The Issue: Even with entries in /etc/hosts, Java's GSSAPI often performs a reverse DNS lookup on the Master's IP. If the IP 10.51.39.121 (from your previous logs) resolves to a different hostname (or no hostname at all) than what is in your keytab (host117), the "GSS initiate" will fail.
-The Fix: Add rdns = false to the [libdefaults] section of your /etc/krb5.conf on all nodes. This forces Kerberos to use the hostname provided by the application rather than trying to resolve the IP back to a name.
3. Service Principal Name (SPN) Case Sensitivity
In hbase-site.xml, the principals are often defined with _HOST placeholders.
-The Issue: If hbase.master.kerberos.principal is set to hbase/_HOST@REALM, HBase replaces _HOST with the fully qualified domain name (FQDN). If your system reports the FQDN as host117.kfs.local but the Kerberos Database (KDB) only has hbase/host117@REALM, the handshake fails.
-The Fix: Ensure the output of the hostname -f command exactly matches the principal stored in the keytab.
4. JAAS "Server" vs. "Client" Sections
Your earlier logs mentioned: “Added the Server login module in the JAAS config file.”
-The Issue: In HBase, the RegionServer acts as a Client when connecting to the Master. If your JAAS configuration only has a Server section and is missing a Client section (or if the Client section has incorrect keytab details), the RegionServer will fail to initiate the GSS context toward the Master.
-The Fix: Ensure your JAAS file contains both sections, and that the Client section points to the correct RegionServer keytab/principal.