Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.
I set up a single node Nifi to test performance of a simple consumeKafka flow. The node has 16 core, 32G RAM and JVM mem set to 16g. The flow is simply two processors, consumeKafka then logMessage. 3 threads for consumeKafka as there are 3 partitions for the kafka topic. Swap threshold is 20000000 so no swap or backpressure will happen to slow down the flow.
I got approximately 8M/s throughput for this setup by not using demarcator, that is every kafka message coming in will generate a flowfile. To improve throughput, I set demarcator so it will now write 10000 records to one flowfile. However, the throughput actually decreases dramatically to 3M/s. I dont see any bottleneck in CPU, heap usage and disk I/O. Is there settings that I missed which holds back the data ingestion rate?