Created 10-27-2025 11:25 PM
Hi everyone,
I’m encountering a critical issue with HBase 2.x: the RegionServer fails to connect to the Master, throwing a “GSS initiate failed” error.
Environment:
To troubleshoot this, I’ve performed the following checks and fixes—all verified as successful (:white_heavy_check_mark:):
Time Synchronization
Clock skew across cluster nodes is only 8 seconds, well within Kerberos tolerance (typically ≤ 5 minutes).
Hostname Resolution
Added explicit entries in /etc/hosts for both host117 and host121 to ensure bidirectional hostname resolution, eliminating potential Kerberos failures due to DNS issues.
Network Connectivity
Confirmed TCP connectivity to the Master’s RPC port using telnet host117 16000.
Kerberos Client Configuration (/etc/krb5.conf)
JAAS Configuration Fix
HBase Security Settings (hbase-site.xml)
Kerberos Ticket Acquisition & Validation
Ticket Cache Cleanup
Despite all the above checks passing, the issue persists.
Has anyone else encountered a similar “GSS initiate failed” error?
Any suggestions on what I might have missed or additional debugging steps would be greatly appreciated!
Created 11-05-2025 10:10 AM
@scala_ Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our HBase experts @shubham_sharma @smdas @pajoshi who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created on 11-06-2025 04:05 AM - edited 11-06-2025 04:05 AM
Hi @scala_
Could you please share the full error message along with the stack trace?
That will help us analyze the issue more accurately and guide you better.