About RAGHUY

RAGHUY · ‎07-01-2026

@solardelune This is a great question. The behavior you are observing is actually working exactly as designed by Impala's Admission Controller. It comes down to how the admission controller "clamps" user limits and how the query planner interacts with the MEM_LIMIT parameter. Here is exactly why your query is clamping to 2GiB, why the estimates change, and how to fix it 1. Why your limit was set to 2GiB (The Clamping Behavior) When you configure min-query-mem-limit (2GiB) and max-query-mem-limit (57GiB), you establish a strict memory floor and ceiling for queries running in that specific resource pool You mentioned that you tried setting the limit to 10000000. Because MEM_LIMIT takes values in bytes if no suffix is provided, 10000000 evaluates to just 10 MB By default, Impala has a setting called Clamp MEM_LIMIT Query Option enabled . If a user sets a MEM_LIMIT that falls outside of your configured min/max range, the Admission Controller forcefully "clamps" the limit back into the allowed boundaries . Because your requested 10 MB fell far below the 2GiB minimum, the admission controller clamped it up to your 2GiB floor . Since this was a heavy query, 2GiB was not enough memory to complete the operation, resulting in an Out-Of-Memory (OOM) error. 2. Why the memory estimate changes when you SET MEM_LIMIT It is completely normal that your estimated memory changes when you manually execute SET MEM_LIMIT=4G or 8G When you set an explicit MEM_LIMIT, Impala's query planner takes this hard boundary into account when constructing the execution strategy. Knowing the memory ceiling changes how the planner assigns internal resources—such as when to activate the "spill-to-disk" mechanism or how to size internal memory buffers . Because the underlying query plan adapts to accommodate your new limit, the resulting memory estimate changes as well. In fact, the Impala error logs explicitly warn that "changing the mem_limit may also change the plan" 3. How to make the daemon use the higher memory limits To stop your queries from OOMing and allow them to utilize up to your 57GiB maximum, you have two options: Option A (Manual Override): Explicitly set the MEM_LIMIT for your heavy query to a value between 2GiB and 57GiB using the correct size suffixes (e.g., SET MEM_LIMIT=30G;) . Because 30 GB is comfortably within your boundaries, the Admission Controller will not clamp it, and your query will be granted the 30 GB needed to execute safely Option B (Automatic Estimation via Statistics): If you do not set a MEM_LIMIT, the Admission Controller relies entirely on the planner's memory estimate to decide how much memory to set aside . If your heavy query is only being assigned a 4GB estimate automatically but requires much more, it means your table statistics are missing, stale, or corrupt. You must run the COMPUTE STATS statement on the tables involved in the query . Accurate statistics allow the planner to correctly calculate a much higher memory estimate, which the Admission Controller will then automatically grant within your 2GiB to 57GiB window

RAGHUY · ‎06-05-2026

Glad that you implemented. The reference in nifi.properties is something like nifi-kv/default/nifi.security.keystorePasswd. Inside Vault, the secret must have the password under the correct key name (usually value by default for KV v1 in NiFi's client). If you stored it under a different key (e.g., password, secret, or custom), it won't pull the right value. Extra spaces, newlines, or character encoding differences when the value was stored in Vault vs. the original plaintext password For many PKCS12 keystores, nifi.security.keyPasswd must match nifi.security.keystorePasswd. If only one is protected and they differ slightly, or if one resolves and the other doesn't, this error appears. Need to check the above. @TechStream

RAGHUY · ‎06-05-2026

@AlokKumar Templates (XML) have been completely removed in Apache NiFi 2.x (including 2.8.0). They are replaced by JSON Flow Definitions. Quick Export/Import (closest to the old template workflow) Export Right-click the Process Group → Download Flow Definition. Choose Without external services (recommended) or With external services. Import Drag a new Process Group from the top toolbar onto the canvas. In the Create Process Group dialog, click the Browse/Upload icon (two boxes with upward arrow) next to the Name field. Select your .json file → Add. Double-click the new Process Group to enter it. Note: Controller Services are not auto-enabled after import — enable them manually. Sensitive properties may need re-entering. Recommended for Production: Use NiFi Registry for versioning and promoting flows across environments. References: https://nifi.apache.org/docs/nifi-docs/html/user-guide.html https://docs.cloudera.com/dataflow/cloud/develop-flow-definitions/topics/cdf-download-flow-definition.html

DianaTorres · ‎05-07-2026

@Lorenzo_F Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

RAGHUY · ‎05-05-2026

1. Cloudera Operational Database backup guide: https://docs.cloudera.com/operational-database/cloud/managing-database/topics/cod-backing-up-table.html https://docs.cloudera.com/cdp-public-cloud/cloud/requirements-azure/topics/mc-az-minimal-setup-for-cloud-storage.html HBase snapshot export also covered in Runtime HBase backup docs. 2. Default backup location: Mainly stores cluster-level backups (Data Lake, FreeIPA, logs/telemetry). Not HBase table data. 3. Different container for manual backups: Use --snapshot-location abfss://[email protected]/path in snapshot/export command. No direct Hadoop config edit needed in CDP Public Cloud. 4. Give HBase rights Assign Storage Blob Delegator role to HBase Managed Identity (at Storage Account level). Grant Storage Blob Data Owner/Contributor on target container. Set POSIX ACLs (Execute + Read/Write) on root and target path via Storage Explorer. Verify IDBroker mapping for HBase user. Test access with hdfs dfs -ls on target path. Enable "Allow trusted Microsoft services" if option available. @MintberryCrunch

rsanchez · ‎03-17-2026

Hello @APentyala Could be please let us know if the solution provided by @RAGHUY fixed your problem? If you still face same issue, let me know so we can help you.

RAGHUY · ‎02-08-2026

@MarlinGomez For that CCA175 streaming scenario with inconsistent formats, cleansing/transforming to HDFS, better to go with Spark Structured Streaming + schema evolution as the most exam-realistic pick. It handles real-time ingestion efficiently via micro-batches, infers/evolves schemas on the fly (especially with JSON/Avro), and lets you apply transformations like filter/map before writing Parquet to HDFS. Separate ETL pipelines per format add too much complexity/overhead for exam constraints, and pure schema-on-read skips proactive cleansing. QuickStart with Kafka source, schema merging enabled: .option("mergeSchema", "true").writeStream... to HDFS.This nails the "perform ETL on data using Spark API" objective perfectly. Good luck on your prep.

9een · ‎01-09-2026

@allen_chu FYI ➤ This issue—characterized by high CPU usage, a large number of threads stuck in DataXceiver, and a high load average—is a classic symptom of TCP socket leakage or connection hanging within the HDFS Data Transfer Protocol. ➤ Based on your top output and jstack, here is the detailed breakdown of what is happening and how to resolve it. ➤ Analysis of the Symptoms 1. CPU Saturation (99% per thread): Your top output shows dozens of DataXceiver threads consuming nearly 100% CPU each. This usually indicates that the threads are in a "busy-wait" or spinning state within the NIO epollWait call. 2. Stuck in epollWait: The jstack shows threads sitting in sun.nio.ch.EPollArrayWrapper.epollWait. While this is a normal state for a thread waiting for I/O, in your case, these threads are likely waiting for a packet from a client that has already disconnected or is "half-closed," but the DataNode hasn't timed out the connection. 3. Thread Exhaustion: With 792 threads, your DataNode is approaching its default dfs.datanode.max.transfer.threads limit (usually 4096, but often throttled by OS ulimit). As these threads accumulate, the DataNode loses the ability to accept new I/O requests, becoming unresponsive. ➤ Recommended Solutions 1. Increase Socket Timeouts (Immediate Fix) The most common cause is that the DataNode waits too long for a slow or dead client. You should tighten the transfer timeouts to force these "zombie" threads to close. => Update your hdfs-site.xml: dfs.datanode.socket.write.timeout: Default is often 0 (no timeout) or several minutes. Set this to 300000 (5 minutes). dfs.datanode.socket.reuse.keepalive: Set to true to allow better connection management. dfs.datanode.transfer.socket.send.buffer.size & recv.buffer.size: Ensure these are set to 131072 (128KB) to optimize throughput and prevent stalls. 2. Increase the Max Receiver Threads If your cluster handles high-concurrency workloads (like Spark or HBase), the default thread count might be too low. <property> <name>dfs.datanode.max.transfer.threads</name> <value>16384</value> </property> 3. Check for Network "Half-Closed" Connections Since the threads are stuck in read, it is possible the OS is keeping sockets in CLOSE_WAIT or FIN_WAIT2 states. a.] Check socket status: Run netstat -anp | grep 9866 | awk '{print $6}' | sort | uniq -c. b.] OS Tuning: Adjust the Linux kernel to more aggressively close dead connections. Add these to /etc/sysctl.conf: net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 60 net.ipv4.tcp_keepalive_probes = 20 4. Address HDFS-14569 (Software Bug) Hadoop 3.1.1 is susceptible to a known issue where DataXceiver threads can leak during block moves or heavy balancer activity. Issue: DataXceiver fails to exit if a client stops sending data mid-packet but keeps the TCP connection open. Recommendation: If possible, upgrade to Hadoop 3.2.1+ or 3.3.x. These versions contain significantly improved NIO handling and better logic for terminating idle Xceivers. ➤ Diagnostic Step: Finding the "Bad" Clients To identify which clients are causing this, run this command on the DataNode: . netstat -atp | grep DataXceiver | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr This will tell you which IP addresses are holding the most DataXceiver connections. If one specific IP (like a single Spark executor or a specific user's edge node) has hundreds of connections, that client's code is likely not closing DFSClient instances correctly.

sathishkr · ‎11-13-2025

Hello, Please try using the hdfs mover command. Refer: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool

Ghilani · ‎09-05-2025

However, I was able to resolve this by leveraging ExecuteStreamCommand (ESC). Specifically, I used the Output Destination Attribute property to push the required attributes into it, which I can then process separately.

Online	Offline
Last Visited	‎07-03-2026 03:12 AM

Member Since	‎10-11-2022 11:06 PM
Last Visited	‎07-03-2026 03:12 AM
Posts	137
Kudos received	22

Cloudera Community

Re: How to import/export templates(if removed then...

Re: Unable to initialize compute cluster CDP Publi...

Re: Unable to initialize compute cluster CDP Publi...

Re: Can I still use HDFS like normal when HDFS Bal...

Re: Does the service stop if a disk io error occur...

Re: Max Query Mem Default doesn't working

Re: NiFi 2.8.0 Support for Retrieving Passwords fr...

Re: How to import/export templates(if removed then...

Re: Unable to initialize compute cluster CDP Publi...

Re: More Details of OperationalDB Snapshots with H...

Re: Hive MERGE Failure Due to Partition Column in ...

Re: Need Help Clarifying a Real CCA175 Scenario

Re: DataXceiver threads stuck with high CPU usage

Re: Do new data arriving into a cold storage folde...

Re: How to update FlowFile attributes in Python no...