I set up a single node Nifi to test performance of a simple consumeKafka flow. The node has 16 core, 32G RAM and JVM mem set to 16g. The flow is simply two processors, consumeKafka then logMessage. 3 threads for consumeKafka as there are 3 partitions for the kafka topic. Swap threshold is 20000000 so no swap or backpressure will happen to slow down the flow.
I got approximately 8M/s throughput for this setup by not using demarcator, that is every kafka message coming in will generate a flowfile. To improve throughput, I set demarcator so it will now write 10000 records to one flowfile. However, the throughput actually decreases dramatically to 3M/s. I dont see any bottleneck in CPU, heap usage and disk I/O. Is there settings that I missed which holds back the data ingestion rate?