Member since
07-19-2020
163
Posts
16
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1064 | 05-01-2025 05:58 AM | |
| 1253 | 04-29-2025 09:43 AM | |
| 1286 | 04-28-2025 07:01 AM | |
| 1552 | 10-22-2024 05:23 AM | |
| 1759 | 10-11-2024 04:28 AM |
01-10-2026
10:40 PM
@scala_ FYI ➤ It appears you have performed an exhaustive verification of the standard Kerberos and HBase configurations. The "GSS initiate failed" error in a Kerberized HBase environment, especially when standard connectivity and ticket validation pass, often points to subtle mismatches in how the Java process handles the security handshake or how the underlying OS interacts with the Kerberos libraries. ➤ Based on the logs and environment details you provided, here are the most likely remaining causes for this issue: 1. Java Cryptography Extension (JCE) and Encryption Types While you confirmed support for AES256 in krb5.conf, the Java Runtime Environment (JRE) itself may be restricting it. -The Issue: Older versions of Java 8 require the JCE Unlimited Strength Jurisdiction Policy Files to be manually installed to handle 256-bit encryption. If the Master is sending an AES256 ticket but the RegionServer's JVM is restricted, the GSS initiation will fail. -The Fix: Ensure the JCE policy files are installed, or if using a modern OpenJDK, ensure the java.security file allows all encryption strengths. You can also try restricting permitted_enctypes in krb5.conf to aes128-cts-hmac-sha1-96 temporarily to see if the connection succeeds with a lower bit-rate. 2. Reverse DNS (RDNS) Mismatch Kerberos is extremely sensitive to how hostnames are resolved. -The Issue: Even with entries in /etc/hosts, Java's GSSAPI often performs a reverse DNS lookup on the Master's IP. If the IP 10.51.39.121 (from your previous logs) resolves to a different hostname (or no hostname at all) than what is in your keytab (host117), the "GSS initiate" will fail. -The Fix: Add rdns = false to the [libdefaults] section of your /etc/krb5.conf on all nodes. This forces Kerberos to use the hostname provided by the application rather than trying to resolve the IP back to a name. 3. Service Principal Name (SPN) Case Sensitivity In hbase-site.xml, the principals are often defined with _HOST placeholders. -The Issue: If hbase.master.kerberos.principal is set to hbase/_HOST@REALM, HBase replaces _HOST with the fully qualified domain name (FQDN). If your system reports the FQDN as host117.kfs.local but the Kerberos Database (KDB) only has hbase/host117@REALM, the handshake fails. -The Fix: Ensure the output of the hostname -f command exactly matches the principal stored in the keytab. 4. JAAS "Server" vs. "Client" Sections Your earlier logs mentioned: “Added the Server login module in the JAAS config file.” -The Issue: In HBase, the RegionServer acts as a Client when connecting to the Master. If your JAAS configuration only has a Server section and is missing a Client section (or if the Client section has incorrect keytab details), the RegionServer will fail to initiate the GSS context toward the Master. -The Fix: Ensure your JAAS file contains both sections, and that the Client section points to the correct RegionServer keytab/principal.
... View more
01-09-2026
11:30 PM
@G_B FYI ➤Based on your report, your data is physically safe on the disks, but the HDFS Metadata Link is broken because your DataNodes cannot talk to your NameNode properly. Here is the breakdown of why this is happening and how to fix it. 1. The Result: "Missing Blocks" and "Forbidden" Errors Missing Blocks (6303): Your NameNode knows the files should exist (metadata is loaded), but because the DataNode blockReport failed due to the Java error, the DataNodes haven't told the NameNode which blocks they are holding. Num of Blocks: 0: Look at your datanode dfsadmin report. It says Num of Blocks: 0. The NameNode thinks that node is empty because the Block Report failed. Head of file / Forbidden: Since the NameNode thinks there are 0 blocks available, it tells your client "I have no DataNodes to give you for this file." 2 . Restart all Hadoop services (NameNode first, then DataNodes). 3. Solution Step 2: Clear the "Lease" and Trigger Block Reports => Check the DataNode logs. You want to see: Successfully sent block report. => Once the reports are successful, run the report again: hdfs dfsadmin -report. => The "Missing Blocks" count should start dropping toward zero, and Num of Blocks should increase. 4.. Troubleshooting the "File Not Found" in fsck The reason fsck said the file didn't exist while ls showed it is likely due to NameNode Safe Mode. When a NameNode starts up and sees 6,000+ missing blocks, it often enters Safe Mode to prevent data loss. Check if you are in safe mode: hdfs dfsadmin -safemode get If it is ON, do not leave it manually until your DataNodes have finished reporting their blocks.
... View more
05-02-2025
01:51 AM
Hi @shubham_sharma , i've tried to reproduce the issue creating a test avro table, quering it i've found that generate close_wait socket. Thanks a lot.
... View more
04-29-2025
09:43 AM
Hi @MaraWang The rebase to HBase 2.6.0 is planned for upcoming CDP releases. We recommend monitoring our release notes for updates regarding this change.
... View more
04-28-2025
07:05 AM
@Shelton Please read my previous answer carefully. None of the properties provided by you are in hbase codebase
... View more
01-25-2025
01:42 AM
1 Kudo
I found a solution for this trouble. I removed kerberos DB with kdb5-util destroy and I recreate it again kdb5-util create -s. Other thing what I found was that, when I firstly created admin cloudera principal I used cloudera-scm instead of cloudera-scm/admin. I am not sure if this could caused problem, but after destroying old DB and created cloudera-scm/admin, generating is working properly.
... View more
01-24-2025
11:53 AM
Hi @CloudSeeker7 , If you can ask the question with example, it would be helpful to check the issue you faced. Then we can find the possible root causes. Please provide the bad record and good record example you are facing.
... View more
12-17-2024
12:41 PM
1 Kudo
@JSSSS The error is this "java.io.IOException: File /user/JS/input/DIC.txt._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation." All the 3 datanode according to the log are excludeNodes=[192.168.1.81:9866, 192.168.1.125:9866, 192.168.1.8> with replication factor of 3 , writes should succeed to all the 3 datanodes else the write fails. The cluster may have under-replicated or unavailable blocks due to excluded nodes HDFS cannot use these nodes, possibly due to: Disk space issues. Write errors or disk failures. Network connectivity problems between the NameNode and DataNodes. 1. Verify if the DataNodes are live and connected to the NameNode hdfs dfsadmin -report Look for the "Live nodes" and "Dead nodes" section If all 3 DataNodes are excluded, they might show up as dead or decommissioned. Ensure the DataNodes have sufficient disk space for the write operation df -h Look at the HDFS data directories (/hadoop/hdfs/data) If disk space is full, clear unnecessary files or increase disk capacity hdfs dfs -rm -r /path/to/old/unused/files View the list of excluded nodes cat $HADOOP_HOME/etc/hadoop/datanodes.exclude If nodes are wrongly excluded: Remove their entries from datanodes.exclude. Refresh the NameNode to apply changes hdfs dfsadmin -refreshNodes Block Placement Policy: If the cluster has DataNodes with specific restrictions (e.g., rack awareness), verify the block placement policy grep dfs.block.replicator.classname $HADOOP_HOME/etc/hadoop/hdfs-site.xml Default: org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault Happy hadooping
... View more
10-14-2024
02:53 PM
1 Kudo
@manyquestions Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more