I have a nifi cluster of 3 nodes with external zookeeper and a simple dataflow consisting of consumeKafka+JoltTransformJSON. All 3 nifi nodes are running on dedicated virtual machines with 32GRAM and 16 core. Nifi heap size set to 8G. The problem I notice is that memory usage never reduces even when all processors are stopped and queues are empty.
The heap usage when dataflow is running is usually around 40~70%(heap size is set up 8g), which isnt too bad. However, from vm monitor tool, memory usage is usually around 80% (for all 32G RAM, and nifi is the only process that runs on each vm). After dataflow is done, there is no significant decrease in heap and memory usage. Heap usage is around 40~50%, memory is at least 60%. A restart of nifi can bring down memory from 90% to 70%. I tried to stop all nifi process, and memory usage is still at 60%. So I have to restart each vm after 2 or 3 run for best performance.
What could cause this memory behavior? From my understanding, flowfile and their content are stored in memory when they are in queue. So ideally, there shouldn't be much memory usage when queues are empty and all processors are stopped? I read a post which points out that there are some archiving/cleanup going on even when no dataflow is running, but memory usage should eventually decrease to normal level right?