Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

HBASE 2 Kerberos GSS filed:RegionServer fails to connect to the Master, throwing a “GSS initiate failed” error.

avatar
Visitor

Hi everyone,

 

I’m encountering a critical issue with HBase 2.x: the RegionServer fails to connect to the Master, throwing a “GSS initiate failed” error.

 

Environment:

  • Master node: host117
  • RegionServer node: host121
  • Kerberos security is enabled
 

To troubleshoot this, I’ve performed the following checks and fixes—all verified as successful (:white_heavy_check_mark:):

 
  1. Time Synchronization
    Clock skew across cluster nodes is only 8 seconds, well within Kerberos tolerance (typically ≤ 5 minutes).

  2. Hostname Resolution
    Added explicit entries in /etc/hosts for both host117 and host121 to ensure bidirectional hostname resolution, eliminating potential Kerberos failures due to DNS issues.

  3. Network Connectivity
    Confirmed TCP connectivity to the Master’s RPC port using telnet host117 16000.

  4. Kerberos Client Configuration (/etc/krb5.conf)

    • Verified KDC is reachable and TGS requests succeed.
    • Confirmed support for AES256 and AES128 encryption types, matching HBase requirements.
  5. JAAS Configuration Fix

    • Added the Server login module in the JAAS config file.
    • Ensured critical parameters are correctly set: useKeyTab=true, valid keyTab path, and accurate principal.
    • Explicitly set useTicketCache=false to prevent ticket cache interference with keytab-based authentication.
  6. HBase Security Settings (hbase-site.xml)

    • Confirmed hbase.security.authentication=kerberos.
    • Validated correct configuration of hbase.master.kerberos.principal and hbase.regionserver.kerberos.principal.
  7. Kerberos Ticket Acquisition & Validation

    • Successfully obtained tickets using kinit -kt <keytab> <principal>.
    • Verified ticket validity, service principal, and encryption type via klist.
  8. Ticket Cache Cleanup

    • Ran kdestroy to clear any stale tickets that might cause conflicts.
 

Despite all the above checks passing, the issue persists.
Has anyone else encountered a similar “GSS initiate failed” error?
Any suggestions on what I might have missed or additional debugging steps would be greatly appreciated!

 

3 REPLIES 3

avatar
Community Manager

@scala_ Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our HBase experts @shubham_sharma @smdas @pajoshi  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Collaborator

Hi @scala_ 

Could you please share the full error message along with the stack trace?
That will help us analyze the issue more accurately and guide you better.

avatar
Expert Contributor

@scala_ FYI

➤ It appears you have performed an exhaustive verification of the standard Kerberos and HBase configurations. The "GSS initiate failed" error in a Kerberized HBase environment, especially when standard connectivity and ticket validation pass, often points to subtle mismatches in how the Java process handles the security handshake or how the underlying OS interacts with the Kerberos libraries.

➤ Based on the logs and environment details you provided, here are the most likely remaining causes for this issue:

1. Java Cryptography Extension (JCE) and Encryption Types
While you confirmed support for AES256 in krb5.conf, the Java Runtime Environment (JRE) itself may be restricting it.

-The Issue: Older versions of Java 8 require the JCE Unlimited Strength Jurisdiction Policy Files to be manually installed to handle 256-bit encryption. If the Master is sending an AES256 ticket but the RegionServer's JVM is restricted, the GSS initiation will fail.

-The Fix: Ensure the JCE policy files are installed, or if using a modern OpenJDK, ensure the java.security file allows all encryption strengths. You can also try restricting permitted_enctypes in krb5.conf to aes128-cts-hmac-sha1-96 temporarily to see if the connection succeeds with a lower bit-rate.

2. Reverse DNS (RDNS) Mismatch
Kerberos is extremely sensitive to how hostnames are resolved.
-The Issue: Even with entries in /etc/hosts, Java's GSSAPI often performs a reverse DNS lookup on the Master's IP. If the IP 10.51.39.121 (from your previous logs) resolves to a different hostname (or no hostname at all) than what is in your keytab (host117), the "GSS initiate" will fail.

-The Fix: Add rdns = false to the [libdefaults] section of your /etc/krb5.conf on all nodes. This forces Kerberos to use the hostname provided by the application rather than trying to resolve the IP back to a name.

3. Service Principal Name (SPN) Case Sensitivity
In hbase-site.xml, the principals are often defined with _HOST placeholders.
-The Issue: If hbase.master.kerberos.principal is set to hbase/_HOST@REALM, HBase replaces _HOST with the fully qualified domain name (FQDN). If your system reports the FQDN as host117.kfs.local but the Kerberos Database (KDB) only has hbase/host117@REALM, the handshake fails.

-The Fix: Ensure the output of the hostname -f command exactly matches the principal stored in the keytab.

4. JAAS "Server" vs. "Client" Sections
Your earlier logs mentioned: “Added the Server login module in the JAAS config file.”
-The Issue: In HBase, the RegionServer acts as a Client when connecting to the Master. If your JAAS configuration only has a Server section and is missing a Client section (or if the Client section has incorrect keytab details), the RegionServer will fail to initiate the GSS context toward the Master.

-The Fix: Ensure your JAAS file contains both sections, and that the Client section points to the correct RegionServer keytab/principal.