Member since
07-14-2020
165
Posts
15
Kudos Received
2
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1521 | 05-24-2024 12:56 AM | |
| 4040 | 05-16-2024 12:20 AM |
01-09-2026
09:41 PM
@Maddy2 FYI ➤ Based on the logs you provided, your NameNode is failing to start because it has encountered a metadata inconsistency while replaying the Edit Logs. This is a critical issue where the NameNode's current state (from the FSImage) contradicts the instructions in the Edit Logs it is trying to process. ➤ The Root Cause The specific error is a java.lang.IllegalStateException during an OP_MKDIR operation (Transaction ID: 29731504). The NameNode is trying to create a directory (/tmp/hive/nifi/...), but the checkState fails because the parent directory for that path does not exist in the namespace it just loaded from the FSImage. This likely happened because: Disk Expansion/Reboot Out of Sync: When you expanded the disk and rebooted, one of the storage directories (/mnt/resource/hadoop/hdfs/namenode) was flagged as unformatted or empty. Metadata Corruption: There is a mismatch between your last successful checkpoint (fsimage_0000000000029731317) and the subsequent edits stored in your Journal Nodes. ➤ Recommended Solution: Metadata Recovery Since this is an HDP (Hortonworks Data Platform) cluster with High Availability (HA), you should attempt to recover by syncing from the "good" metadata or forcing a metadata skip. => Step 1: Identify the Healthy NameNode Ensure you are working on the NameNode that has the most recent and intact data. Check the other NameNode's logs to see if it also fails at the same Transaction ID. => Step 2:On Standby or failing Namenode kindly Check the permission of edits log and fsimage present in path dfs.namenode.name.dir and see if it matches with permission mentioned in Active Namenode =>Step 3: Bootstrap from the Standby (If HA is healthy) If one NameNode is able to start or has better metadata, you can re-sync the failing node: => Stop the failing NameNode. On the failing node, clear the NameNode storage directories (as defined in dfs.namenode.name.dir). Run the bootstrap command to pull metadata from the active/healthy NameNode: $ hdfs namenode -bootstrapStandby 4. Start the NameNode.
... View more
09-17-2025
05:11 AM
Hi Team, The Kudu leader master maintains the entire metadata state in memory. This includes all table schemas, tablet locations, consensus state, and other cluster metadata. If your cluster has a large number of tables, tablets, partitions, or complex schemas, the metadata size can grow substantially. Tablet servers use less memory because they mainly hold tablet data and cache, but metadata is primarily on masters.Might be this is the reason we are seeing high memory usage on kudu leader master node.
... View more
07-30-2025
07:42 AM
bc3dcd485adfa1c339eab38f1516c6c5 >> These alpha numeric related to tablet from kudu, region from habse or container for ozone, did you get a chance to check recon ui
... View more
12-12-2024
09:48 AM
1 Kudo
@cc_yang It could be possible you may have enabled HDFS space quota to the directory and the directory may have reached to it hard limit, causing the file upload throws insufficient space message. Refer more about HDFS quota as below. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html
... View more
12-11-2024
01:16 AM
1 Kudo
@VidyaSargur it somewhat helped. It was failing because we had an NFS client running on that server. Since we have a customer-facing client -> server architecture for NFS, we could not start the HDFS NFS Gateway again on the same port. So, the only solution was to stop the HDFS NFS Gateway.
... View more
11-20-2024
11:31 PM
1 Kudo
@sde_20241, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. However, if you still have concerns, please provide the information that @Asfahan has requested.
... View more
11-15-2024
02:26 AM
1 Kudo
HI Team, if you are storing the back in HDFS like above -rootPath file:/// [***DIRECTORY TO USE FOR BACKUP***] then use the below command: hdfs dfs -du -s -h file:/// [***DIRECTORY TO USE FOR BACKUP***] if its different storage like ozone(ofs/o3) then also hdfs command will work if its S3 then use the aws command
... View more
11-15-2024
02:20 AM
1 Kudo
Hi Team, It does looks like too much data is writting to TS which causing it to reach 100%, did you check from TOP command how much kudu process is taking To fix it: 1. Please us the API in batch mode dont run all the jobs at one go as kudu works better in batch process 2. if possible reduce the load via increasing TS and data disk 3. Also check if any TP is running like antivirus which is not recommend and can cause a huge spike in CPU
... View more
11-15-2024
02:17 AM
1 Kudo
1. Does it reduce the memory consumption once restart of this server 2. Leader master will have more consumption but in your case its look like a bug to me What is the CDH/CDP version you are using
... View more
11-15-2024
02:15 AM
1 Kudo
Its Because of this Disk change where UUID of this TS was different but after change of WAL it created new one hence you seeing the wrong_server_UUID , simple restart of this TS will fix the issue in most of cases if not then please rebuild this TS from scratch while deleting the data and wal dir of this one it will solve the issue Note: Only proceed with rebuild if you have RF=3 in the cluster otherwise it will be a dataloss scenario
... View more