Created on 05-27-2026 09:58 PM - edited 05-27-2026 10:34 PM
Hi
We are having problems with a single instance of nifi 2.5.0 on a physical windows 11 machine with 64 CPU cores, 64 GB RAM and NVME 12 TB ( 4 x 3 TB drives ) striped D : drive set ingesting slowly. C: drive is for windows.
I was hoping someone could maybe give a sequential list of things to check so we at least are approaching this the right way.
The people who set up the machine have put the ingest directory ( ingest files via Getfiles ), and the nifi instance ( content repo and everything else ) on the same single D ; drive striped disk set. Being a striped set I wouldn't have thought speed would be an issue even with both ingest and content repo on the same striped set.
So one thing I wondered was can we force nifi to use all cores, or do we need keep some cores ( maybe 4 ? ) for windows to use? And if we did use say 60 cores, what is the best way to do this - and is it set at a nifi processor level, or do you need to force Nifi to use all cores at Nifi boot up? The Nifi JVM has 8 GB of memory allocated and I haven't seen it go much past 50% utilization, which hopefully keeps garbage collection in check.
We do ingest individual 200 GB size files ( but also smaller files ) , so would it be worth re-formatting the stripe set for much larger block sizes to stop additional seeks? I'm guessing they used standard 4 KB blocks, I was thinking of maybe using 1 MB blocks?
Standard windows Antivirius is running on this machine and also our Dev machine, but in the Dev environment 8 core windows 11 machine, ingest is fast as we would expect.
Would welcome thoughts, thanks in advance....
Created on 05-28-2026 05:57 AM - edited 05-28-2026 06:00 AM
@zzzz77 In your bigger machine env are you adjusting the flow to tune peformance? E.G. Do you increase concurrency, adjust active threads pool, etc to make sure that you are getting the most possible use of the cores? This is where you should start. You should be able to get a lot more active threads going in the larger env before needing to worry about disk contention.
You may want to bump up the ram min/max, but i would do this methodically. If its 8, go 16 and see the results, then 32 and compare all 3. 32 should be as high as you need to go, but I have seen higher. NiFi does a good job of memory management above the min/max.
Ideally you would want nifi disks mounted separately (see docs) but since you already have a baseline in dev likely without dedicated disks, I suspect you will see improvements using all 32+ cores vs 8 even with "slow" disks...
This is in k8s/nifi, but you will see how to crank up the CPU:
https://stevenmatison.com/blog/Max-CPU-with-NiFi-on-Minikube/
Created on 05-28-2026 03:31 PM - edited 05-28-2026 06:11 PM
Hi,
Thank you very much for this info, we will try it and let you know how it goes.
One of the other engineers was thinking of trying to start up nifi with a switch using a string that had "FFFFFFFFFF" in it ( I only saw it briefly ) I think with the idea of forcing nifi to start with all CPUs initially.
I think he was looking at also changing CPU affinity on the workstation via windows task manager but don't know where he got to with it.
Hypothetically, if you had say a setting of nifi.flowcontroller.max.timer.driven.thread.count=36 and you had 10 processors on your canvas, and each processor was set to run 4 concurrent tasks. In my mind this means there is either complete starvation of 1 processor downstream with zero threads and cause a bottleneck. or would Nifi reduce thread allocation to all processors to ensure it had max output like say allocating 3 threads per processor and leave 6 in reserve? In this case would it be better to set nifi.flowcontroller.max.timer.driven.thread.count=48 to ensure there is no thread starvation and have a safety window of 48 - (10x4) = 8 ?
If you have a safety window of 8 threads, would these be allocated to say garbage collection and other activities or is that controlled by the threads in use by JVM ? Does Nifi tell the JVM it needs to run with minimum 48 threads since nifi.flowcontroller.max.timer.driven.thread.count=48 has been set ? How does that JVM/nifi interplay work please?
I'm trying to understand the right way to get tuning right to avoid resource starvation. 🙂
Cheers
Created on 05-30-2026 03:07 AM - edited 05-30-2026 03:08 AM
@zzzz77
I would say you start with Step 1 Defender exclusions it is the single most impactful change, requires no downtime, and takes under 5 minutes. Monitor throughput immediately after applying it before doing anything else. In most cases this alone resolves 80% of the performance gap.
SYMPTOMS:
✗ Slow ingest on production (64 core, 64GB RAM)
✓ Fast ingest on dev (8 core machine)
LIKELY CAUSES IN PRIORITY ORDER:
STEP 2 — WINDOWS POWER PLAN
1. Windows Defender scanning 200GB files in real-time ← MOST LIKELY
2. NiFi thread pool under-configured for 64 cores
3. JVM heap too small (8GB for 200GB file processing)
4. Ingest dir and content repo on same I/O path
5. 4KB block size misaligned for large file processing
6. NiFi content repository not spread across drives
7. Java GC pressure under large file load
8. Windows power plan not set to High Performance
9. GetFile processor running single-threaded
10. NiFi provenance repository competing for I/O
STEP 1 WINDOWS DEFENDER (DO THIS FIRST)
This is almost certainly your primary bottleneck. Defender scans every byte of every 200GB file as NiFi reads it.
How to verify Defender is the issue before making changes:
STEP 2 WINDOWS POWER PLAN
Happy hadooping
Let me know how it works out
Created 06-01-2026 03:21 PM
Hi
Thank you very much for this info, we will apply and let you know how it goes.
Cheers
Created 06-01-2026 09:48 PM
A few things stand out from the numbers you've shared.
On a 64-core machine, ingesting ~11 million rows in 17 minutes (around 10–11K rows/sec) is significantly below what I'd expect if the workload were effectively parallelized. Before focusing on CPU count, I'd investigate where the bottleneck actually is.
Some areas worth checking:
Storage throughput: Is the data being written to local SSDs, network-attached storage, or slower disks? Ingest workloads are often I/O-bound rather than CPU-bound.
File size and partitioning strategy: Large numbers of small files can severely impact write performance.
Compression settings: Certain codecs provide better compression but consume more CPU during ingest.
Thread parallelism: Verify that the ingestion framework is actually utilizing all available cores rather than being limited by a small worker pool.
Memory pressure and GC activity: If the JVM is spending significant time in garbage collection, additional CPU cores won't help much.
Network throughput: If data is being pulled from a remote source, the bottleneck may be upstream rather than on the ingest node itself.
I'd also recommend collecting:
CPU utilization during ingest
Disk IOPS and throughput metrics
Memory usage and GC logs
Number of concurrent ingest tasks
Average file size being generated
One quick diagnostic is to look at overall CPU utilization. If the machine is only using 10–20% of available CPU during ingest, then the workload is likely blocked on I/O, synchronization, network transfers, or application-level limits rather than raw compute capacity.
Can you share:
Which ingestion tool/framework you're using?
The storage type (SSD, NVMe, HDD, cloud volume, etc.)?
Average CPU utilization during the 17-minute ingest?
Whether the target table is partitioned and, if so, by what key?
Those details would make it much easier to determine whether the bottleneck is CPU, disk, network, or configuration-related.