Member since
09-03-2020
339
Posts
7
Kudos Received
8
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 179 | 01-09-2026 06:05 AM | |
| 1432 | 04-09-2024 05:59 AM | |
| 1846 | 04-06-2024 12:35 AM | |
| 1883 | 03-21-2024 07:58 AM | |
| 2844 | 03-04-2024 06:04 AM |
01-10-2026
11:06 PM
@EmilKle FYI ➤This issue typically arises because the comma (,) is a reserved delimiter in HBase for the hbase:meta table structure, used to separate the Table Name, Start Key, and Region ID. When a rowkey is inserted with an unexpected comma, the HBase shell and client API often misinterpret the entry as a malformed region name, causing GET or DELETE commands to route incorrectly or fail validation. ➤Here is how you can approach a safe repair, as traditional methods like HBCK2 fixMeta often bypass these "illegal" keys if they don't follow the expected region naming convention. 1. Use the HBase Shell with Hexadecimal Rowkeys Standard string-based commands in the shell often fail because the shell parses the comma as a delimiter. Instead, find the exact byte representation of the rowkey and delete it using hexadecimal notation. 1. Find the Hex Key: Run a scan to get the exact bytes of the corrupted row. scan 'hbase:meta', {ROWPREFIXFILTER => 'rowkey,'} 2. Delete using the binary string: If the rowkey is exactly rowkey,, use the binary notation in the shell: delete 'hbase:meta', "rowkey\x2C", 'info:regioninfo' (Note: \x2C is the hex code for a comma).
... View more
01-10-2026
10:40 PM
@scala_ FYI ➤ It appears you have performed an exhaustive verification of the standard Kerberos and HBase configurations. The "GSS initiate failed" error in a Kerberized HBase environment, especially when standard connectivity and ticket validation pass, often points to subtle mismatches in how the Java process handles the security handshake or how the underlying OS interacts with the Kerberos libraries. ➤ Based on the logs and environment details you provided, here are the most likely remaining causes for this issue: 1. Java Cryptography Extension (JCE) and Encryption Types While you confirmed support for AES256 in krb5.conf, the Java Runtime Environment (JRE) itself may be restricting it. -The Issue: Older versions of Java 8 require the JCE Unlimited Strength Jurisdiction Policy Files to be manually installed to handle 256-bit encryption. If the Master is sending an AES256 ticket but the RegionServer's JVM is restricted, the GSS initiation will fail. -The Fix: Ensure the JCE policy files are installed, or if using a modern OpenJDK, ensure the java.security file allows all encryption strengths. You can also try restricting permitted_enctypes in krb5.conf to aes128-cts-hmac-sha1-96 temporarily to see if the connection succeeds with a lower bit-rate. 2. Reverse DNS (RDNS) Mismatch Kerberos is extremely sensitive to how hostnames are resolved. -The Issue: Even with entries in /etc/hosts, Java's GSSAPI often performs a reverse DNS lookup on the Master's IP. If the IP 10.51.39.121 (from your previous logs) resolves to a different hostname (or no hostname at all) than what is in your keytab (host117), the "GSS initiate" will fail. -The Fix: Add rdns = false to the [libdefaults] section of your /etc/krb5.conf on all nodes. This forces Kerberos to use the hostname provided by the application rather than trying to resolve the IP back to a name. 3. Service Principal Name (SPN) Case Sensitivity In hbase-site.xml, the principals are often defined with _HOST placeholders. -The Issue: If hbase.master.kerberos.principal is set to hbase/_HOST@REALM, HBase replaces _HOST with the fully qualified domain name (FQDN). If your system reports the FQDN as host117.kfs.local but the Kerberos Database (KDB) only has hbase/host117@REALM, the handshake fails. -The Fix: Ensure the output of the hostname -f command exactly matches the principal stored in the keytab. 4. JAAS "Server" vs. "Client" Sections Your earlier logs mentioned: “Added the Server login module in the JAAS config file.” -The Issue: In HBase, the RegionServer acts as a Client when connecting to the Master. If your JAAS configuration only has a Server section and is missing a Client section (or if the Client section has incorrect keytab details), the RegionServer will fail to initiate the GSS context toward the Master. -The Fix: Ensure your JAAS file contains both sections, and that the Client section points to the correct RegionServer keytab/principal.
... View more
01-10-2026
10:22 PM
@jkoral FYI ➤ Based on the logs provided, the checkpoint failure is caused by an authentication mismatch during the FSImage upload process, further complicated by an underlying storage type configuration issue ➤ Primary Reason: Authentication Failure (403 Forbidden) The Standby NameNode (SNN) successfully performs the checkpoint locally but fails to upload the merged fsimage back to the Active NameNode (NN). -The Error: The SNN logs report: java.io.IOException: Exception during image upload: Response: 403 (Forbidden), Message: Non-exception fault: Authentication failed. -The Mechanism: After merging the edits, the SNN attempts to POST the new image to the NN via HTTP. The NN rejects this request because it cannot verify the identity of the SNN, which is common in new clusters where Kerberos or shared secret configurations are not fully synchronized. ➤ Recommended Fixes Verify HTTP Authentication: Check the dfs.namenode.secondary.http-address and dfs.namenode.http-address settings. Ensure the hdfs user has consistent permissions across both hosts. Check Firewall/SELinux: Since this is RHEL9, ensure that the SNN can communicate with the NN on port 9870 (or 9871 if SSL is enabled).
... View more
01-10-2026
12:24 AM
@rizalt FYI ➤It appears your cluster is experiencing a Quorum Failure, where the critical metadata services (NameNode, JournalNodes, and ZooKeeper) are losing the ability to maintain a majority when one Data Center (DC) goes offline. ➤ Analyzing Your Failure In a High Availability (HA) setup, the NameNodes rely on a "Quorum" of JournalNodes (JN) and ZooKeeper (ZK) nodes to stay alive. If you have 5 JNs, you must have at least 3 running for either NameNode to function. Based on your diagram: The Problem: If you split your 5 JournalNodes evenly (e.g., 2 in DC1 and 3 in DC2), and the DC with 3 JNs goes down, the remaining 2 JNs cannot form a quorum. This causes both NameNodes to shut down immediately to prevent data corruption ("Split Brain"). DataNodes vs. NameNodes: In HDFS, the number of DataNodes (DN) that fail has no direct impact on whether the NameNode stays "Up" or "Down." You could lose 15 out of 16 DataNodes, and the NameNode should still stay active. The fact that your NameNodes are crashing when 8 DNs (one full server/DC) go down proves that your Quorum nodes (JN/ZK) are failing, not the DataNodes. ➤ The Solution: Quorum Placement To survive a full Data Center failure (50% of your physical infrastructure), you cannot rely on an even split of nodes. You need a third location (a "Witness" site) or an asymmetric distribution. 1. The 3-Site Strategy (Recommended) To handle a 1-DC failure with 5 JournalNodes and 5 ZooKeeper nodes, place them as follows: -DC 1: 2 JN, 2 ZK -DC 2: 2 JN, 2 ZK -Site 3 (Witness): 1 JN, 1 ZK (This can be a very small virtual machine or cloud instance). Why this works: If DC1 or DC2 fails, the remaining site + the Witness site equals 3 nodes, which satisfies the quorum ($3 > 5/2$). 2. Maximum DataNode Failure Theoretically: You can lose all but 1 DataNode ($N-1$), and the NameNode will stay "Active." Practically: If you have a replication factor of 3, and you lose 50% of your nodes, many blocks will become "Under-replicated," and some may become "Missing" if all three copies were in the DC that died. Solution: Ensure your Rack Awareness is configured so HDFS knows which nodes belong to which DC. This forces HDFS to keep at least one copy of data in each DC. ➤ Why 11 JN and 11 ZK didn't work Increasing the number of nodes to 11 actually makes the cluster more fragile if they are only placed in two locations. With 11 nodes, you need 6 to be alive to form a quorum. If you have 5 in DC1 and 6 in DC2, and DC2 fails, the 5 remaining nodes in DC1 cannot reach the 6-node requirement. ➤ Checklist for Survival Reduce to 5 JNs and 5 ZKs: Too many nodes increase network latency and management overhead. Add a 3rd Location: Even a single low-power node in a different building or cloud region to act as the "tie-breaker." Check dfs.namonode.edits.dir: Ensure the NameNodes are configured to point to all JournalNodes by URI. ZooKeeper FC: Ensure DFS ZK Failover Controller is running on both NameNode hosts.
... View more
01-09-2026
11:52 PM
➤ It sounds like you are encountering a common issue in HDFS where the metadata overhead and block minimums are causing a massive discrepancy between your actual data size and your disk utilization. While 650 files at 4MB each technically equals 2.6GB of data, the way HDFS manages these on your physical disks (especially in smaller or test clusters) can lead to unexpected storage consumption. ➤ Root Causes of the 100% Utilization 1. Reserved Space and "Non-DFS Used" HDFS does not have access to the entire disk. By default, Hadoop reserves a portion of the disk for the OS and non-Hadoop data (usually defined by dfs.datanode.du.reserved). If you are running on small disks (e.g., 20GB–50GB), the combination of your data, logs, and reserved space can quickly hit the 100% threshold. 2. Local Filesystem Block Overheads Even though your HDFS block size is 4MB, your underlying OS filesystem (EXT4 or XFS) uses its own block size (usually 4KB). However, the metadata for 650 individual files, their checksums (.meta files), and the edit logs on the NameNode create a "death by a thousand cuts" scenario for small disks. 3. Log Accumulation Check /var/log/hadoop or your configured log directory. In HDFS 3.3.5, if a cluster is struggling with space, the DataNodes and NameNodes generate massive amounts of "Heartbeat" and "Disk Full" logs, which consume the remaining Non-DFS space, pushing the disk to 100%. ➤How to Tackle the Situation Step 1: Identify Where the Space Is Going Run the following command to see if the space is taken by HDFS data or other files: $ hdfs dfsadmin -report DFS Used: Space taken by your 650 files. Non-DFS Used: Space taken by logs, OS, and other applications. If this is high, your logs are the culprit. Step 2: Clear Logs and Temporary Data If "Non-DFS Used" is high, clear out the Hadoop log directory: # Example path rm -rf /var/log/hadoop/hdfs/*.log.* rm -rf /var/log/hadoop/hdfs/*.out.* Step 3: Adjust the "Disk Checked" Thresholds By default, a DataNode stops working if the disk is 95% full. If you are in a test environment and need to squeeze out more space, you can lower the reserved space in hdfs-site.xml: <property> <name>dfs.datanode.du.reserved</name> <value>1073741824</value> </property> Step 4: Combine Small Files (Long-term Fix) HDFS is designed for large files.1 650 files of 4MB are considered "Small Files." The Problem: Every file, regardless of size, takes up roughly 150 bytes of RAM on the NameNode and creates separate metadata entries. The Solution: Use the getmerge command or a MapReduce/Spark job to combine these 650 files into 2 or 3 larger files (e.g., 1GB each).
... View more
01-09-2026
11:30 PM
@G_B FYI ➤Based on your report, your data is physically safe on the disks, but the HDFS Metadata Link is broken because your DataNodes cannot talk to your NameNode properly. Here is the breakdown of why this is happening and how to fix it. 1. The Result: "Missing Blocks" and "Forbidden" Errors Missing Blocks (6303): Your NameNode knows the files should exist (metadata is loaded), but because the DataNode blockReport failed due to the Java error, the DataNodes haven't told the NameNode which blocks they are holding. Num of Blocks: 0: Look at your datanode dfsadmin report. It says Num of Blocks: 0. The NameNode thinks that node is empty because the Block Report failed. Head of file / Forbidden: Since the NameNode thinks there are 0 blocks available, it tells your client "I have no DataNodes to give you for this file." 2 . Restart all Hadoop services (NameNode first, then DataNodes). 3. Solution Step 2: Clear the "Lease" and Trigger Block Reports => Check the DataNode logs. You want to see: Successfully sent block report. => Once the reports are successful, run the report again: hdfs dfsadmin -report. => The "Missing Blocks" count should start dropping toward zero, and Num of Blocks should increase. 4.. Troubleshooting the "File Not Found" in fsck The reason fsck said the file didn't exist while ls showed it is likely due to NameNode Safe Mode. When a NameNode starts up and sees 6,000+ missing blocks, it often enters Safe Mode to prevent data loss. Check if you are in safe mode: hdfs dfsadmin -safemode get If it is ON, do not leave it manually until your DataNodes have finished reporting their blocks.
... View more
01-09-2026
10:46 PM
@Hadoop16 FYI ➤ This error occurs because of a token delegation gap between Hive and the HDFS Router. In a Kerberized cluster, when Hive (running on a DataNode/Compute node via Tez or MapReduce) attempts to write to HDFS, it needs a Delegation Token. When you use an HDFS Router address, Hive must be explicitly told to obtain a token specifically for the Router's service principal, which may be different from the backend NameNodes. ➤ The Root Cause The error Client cannot authenticate via:[TOKEN, KERBEROS] at the FileSinkOperator stage indicates that the tasks running on your worker nodes do not have a valid token to "speak" to the Router at router_host:8888. When Hive plans the job, it usually fetches tokens for the default filesystem. If your fs.defaultFS is set to a regular NameNode but your table location is an RBF address, Hive might not be fetching the secondary token required for the Router. ➤ The Fix: Configure Token Requirements You need to ensure Hive and the underlying MapReduce/Tez framework know to fetch tokens for the Router's URI. 1. Add the Router URI to Hive's Token List In your Hive session (or globally in hive-site.xml), you must define the Router as a "known" filesystem that requires tokens. SET hive.metastore.token.signature=hdfs://router_host:8888; SET mapreduce.job.hdfs-servers=hdfs://router_host:8888,hdfs://nameservice-backend; 2. Configure HDFS Client to "Trust" the Router for Tokens In core-site.xml or hdfs-site.xml, you need to enable the Router to act as a proxy for the backend NameNodes so it can pass the tokens correctly. <property> <name>dfs.federation.router.delegation.token.enable</name> <value>true</value> </property> ➤ Critical Kerberos Configuration Because the Router is an intermediary, it must be allowed to impersonate the user (Hive) when talking to the backend. Ensure your ProxyUser settings in core-site.xml include the Router's service principal. Assuming your Router runs as the hdfs or router user: <property> <name>hadoop.proxyuser.router.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.router.hosts</name> <value>*</value> </property> ➤ Diagnostic Verification To prove if the token is missing, run this command from the datanode_host mentioned in your error logs using the same user running the Hive job: # Check if you can manually get a token for the router hdfs fetchdt --renewer hdfs hdfs://router_host:8888 router.token # Check the contents of your current credentials cache klist -f If fetchdt fails, the issue is with the Router's ability to issue tokens. If it succeeds but Hive fails, the issue is with Hive's Job Submission not including the Router URI in the mapreduce.job.hdfs-servers list.
... View more
01-09-2026
09:51 PM
@allen_chu FYI ➤ This issue—characterized by high CPU usage, a large number of threads stuck in DataXceiver, and a high load average—is a classic symptom of TCP socket leakage or connection hanging within the HDFS Data Transfer Protocol. ➤ Based on your top output and jstack, here is the detailed breakdown of what is happening and how to resolve it. ➤ Analysis of the Symptoms 1. CPU Saturation (99% per thread): Your top output shows dozens of DataXceiver threads consuming nearly 100% CPU each. This usually indicates that the threads are in a "busy-wait" or spinning state within the NIO epollWait call. 2. Stuck in epollWait: The jstack shows threads sitting in sun.nio.ch.EPollArrayWrapper.epollWait. While this is a normal state for a thread waiting for I/O, in your case, these threads are likely waiting for a packet from a client that has already disconnected or is "half-closed," but the DataNode hasn't timed out the connection. 3. Thread Exhaustion: With 792 threads, your DataNode is approaching its default dfs.datanode.max.transfer.threads limit (usually 4096, but often throttled by OS ulimit). As these threads accumulate, the DataNode loses the ability to accept new I/O requests, becoming unresponsive. ➤ Recommended Solutions 1. Increase Socket Timeouts (Immediate Fix) The most common cause is that the DataNode waits too long for a slow or dead client. You should tighten the transfer timeouts to force these "zombie" threads to close. => Update your hdfs-site.xml: dfs.datanode.socket.write.timeout: Default is often 0 (no timeout) or several minutes. Set this to 300000 (5 minutes). dfs.datanode.socket.reuse.keepalive: Set to true to allow better connection management. dfs.datanode.transfer.socket.send.buffer.size & recv.buffer.size: Ensure these are set to 131072 (128KB) to optimize throughput and prevent stalls. 2. Increase the Max Receiver Threads If your cluster handles high-concurrency workloads (like Spark or HBase), the default thread count might be too low. <property> <name>dfs.datanode.max.transfer.threads</name> <value>16384</value> </property> 3. Check for Network "Half-Closed" Connections Since the threads are stuck in read, it is possible the OS is keeping sockets in CLOSE_WAIT or FIN_WAIT2 states. a.] Check socket status: Run netstat -anp | grep 9866 | awk '{print $6}' | sort | uniq -c. b.] OS Tuning: Adjust the Linux kernel to more aggressively close dead connections. Add these to /etc/sysctl.conf: net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 60 net.ipv4.tcp_keepalive_probes = 20 4. Address HDFS-14569 (Software Bug) Hadoop 3.1.1 is susceptible to a known issue where DataXceiver threads can leak during block moves or heavy balancer activity. Issue: DataXceiver fails to exit if a client stops sending data mid-packet but keeps the TCP connection open. Recommendation: If possible, upgrade to Hadoop 3.2.1+ or 3.3.x. These versions contain significantly improved NIO handling and better logic for terminating idle Xceivers. ➤ Diagnostic Step: Finding the "Bad" Clients To identify which clients are causing this, run this command on the DataNode: . netstat -atp | grep DataXceiver | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr This will tell you which IP addresses are holding the most DataXceiver connections. If one specific IP (like a single Spark executor or a specific user's edge node) has hundreds of connections, that client's code is likely not closing DFSClient instances correctly.
... View more
01-09-2026
09:41 PM
@Maddy2 FYI ➤ Based on the logs you provided, your NameNode is failing to start because it has encountered a metadata inconsistency while replaying the Edit Logs. This is a critical issue where the NameNode's current state (from the FSImage) contradicts the instructions in the Edit Logs it is trying to process. ➤ The Root Cause The specific error is a java.lang.IllegalStateException during an OP_MKDIR operation (Transaction ID: 29731504). The NameNode is trying to create a directory (/tmp/hive/nifi/...), but the checkState fails because the parent directory for that path does not exist in the namespace it just loaded from the FSImage. This likely happened because: Disk Expansion/Reboot Out of Sync: When you expanded the disk and rebooted, one of the storage directories (/mnt/resource/hadoop/hdfs/namenode) was flagged as unformatted or empty. Metadata Corruption: There is a mismatch between your last successful checkpoint (fsimage_0000000000029731317) and the subsequent edits stored in your Journal Nodes. ➤ Recommended Solution: Metadata Recovery Since this is an HDP (Hortonworks Data Platform) cluster with High Availability (HA), you should attempt to recover by syncing from the "good" metadata or forcing a metadata skip. => Step 1: Identify the Healthy NameNode Ensure you are working on the NameNode that has the most recent and intact data. Check the other NameNode's logs to see if it also fails at the same Transaction ID. => Step 2:On Standby or failing Namenode kindly Check the permission of edits log and fsimage present in path dfs.namenode.name.dir and see if it matches with permission mentioned in Active Namenode =>Step 3: Bootstrap from the Standby (If HA is healthy) If one NameNode is able to start or has better metadata, you can re-sync the failing node: => Stop the failing NameNode. On the failing node, clear the NameNode storage directories (as defined in dfs.namenode.name.dir). Run the bootstrap command to pull metadata from the active/healthy NameNode: $ hdfs namenode -bootstrapStandby 4. Start the NameNode.
... View more
01-09-2026
06:05 AM
1. Does cp just restore the inode? No. When you use the cp command, the system does not simply "re-link" the old metadata or inode. Instead, it creates brand-new files. The cp command reads the data blocks from the snapshot and writes them to the target directory as new blocks. The resulting files will have new Inode IDs, new timestamps, and your current user as the owner. It is a heavy operation because it physically duplicates the data on the disk (until the background deduplication/hard-linking handles it, depending on the underlying filesystem). 2. What happens to unmodified files? By default, the hdfs dfs -cp command will overwrite the existing files in the target directory even if they haven't changed. It does not perform a "diff" or a check to see if the content is identical. It will read the file from the snapshot and overwrite the live file, resulting in unnecessary I/O and network traffic. 3. Are those files skipped? No. Using cp with a wildcard (*) will force the system to attempt to copy everything. If a file with the same name exists in the target, the command will fail with a "File exists" error unless you use specific flags (like -f in some environments) or delete the target first. ➤ The "Better" Way (Recommended Fix) If the goal is to only restore metadata or only update files that actually changed, using cp is inefficient. Instead, suggest the following: A. Use distcp with the -update flag distcp is much smarter. It compares the source (snapshot) and the target (live) and only copies files that have different sizes or checksums. hadoop distcp -update -ptag <snaproot>/.snapshot/<name>/ <target_dir>/ -update: Only copies files if the size/checksum differs. -ptag: Preserves the original permissions, timestamps, and ACLs. B. Manual "Restore" (Metadata only) If you only changed permissions (chmod) and didn't touch the data, the most efficient "restore" isn't a copy at all—it’s simply running chmod again to set them back. Snapshots are great for insurance, but they are most useful for data recovery, not metadata undoing.
... View more