My Dev NiFi instance is stuck (no active threads - nothing happening). I can see two errors in the log:
Cannot update repository because all partitions are unusable at this time. Writing to the repository would cause corruption. This most often happens as a result of the repository running out of disk space or the JMV running out of memory.
Unable to merge /dfs1/nifi/data/provenance_repository/journals/4289216.journal.15 with other Journal Files due to java.io.FileNotFoundException: Unable to locate file /dfs1/nifi/data/provenance_repository/journals/4289216.journal.15
As suggested above, I looked at the disks and memory. The disks are fine (>30% free) but it looks like the JVM is running out of memory as the heap usage is currently (and consistently) 97%+. Also, machine still has 8g free.
Are there legitimate reasons that NiFi might run out of memory or does this look more like a memory leak? There's lots of custom processors running around but I don't have access to the code.
Are there resources about java memory management in a NiFi specific context? Just trying to narrow down what might have caused this.
NiFi version is 0.6
Hard to say what any custom code is doing as far as heap usage, but some existing processors can use considerable heap space. I would say that FlowFiles attributes consume the majority of the heap space in most cases. A FlowFile consists of two parts, the FlowFile Content (which lives in the NiFi content repository) and the FlowFile Attributes (This is metadata about the FlowFile and lives in heap  ). While generally speaking the amount of heap that FlowFile Attributes consumes is relatively small, users can build flows that have the exact opposite affect. If the user's dataflow uses processors to read large amounts of content and write it to NiFi FlowFile Attributes, heap usage will go up rapidly. If users allow large connection queues to build within the dataflow, heap usage will go up.
- Evaluate available system memory and the configured heap size for your NiF. The heap defaults for NiFi are relatively small. They are set in the bootstrap.conf and have default values of only 512 MB min and max. This is generally to small for any significant dataflow. I recommend setting both min and max values to the same value. Adjust these values according to available free memory on your system with out going to crazy. Try 4096MB and see how that performs first. Adjusting heap setting will require a nifi restart to take affect.
- Evaluate your dataflow for areas where high connection queues exist. Setting backpressure through out your dataflow is one way to keep queues from growing to large.
- Evaluate your flow for anywhere where you may be extracting content form your FlowFiles in to FlowFile attributes. IS it necessary or can the amount of content extracted be reduced.
- Processors like mergeContent, SplitContent, SplitText, etc can use a lot of heap depending on the incoming FlowFile(s) and configuration. For example a mergeContent configured to merge 100,000 FlowFiles is going to use a lot of heap bining that many FlowFiles. A better approach is to use to mergeContent processor in a row with the first merging 10,000 and the second merging bundles of 10 again to create the 100,000 desired end result. Same goes for SplitText. If your source FlowFile results in excess of 10,000 splits, try using two SplitText processors (First splitting by every 10,000 lines and the second splitting those by every line.) With either example above you are reducing he amount of FlowFiles held in heap memory at any given time.
 -- NiFi uses FlowFile swapping to help reduce heap usage. FlowFile attributes live in heap memory for faster processing. If a connection exceeds the configured swap threshold (default 10,000 set in nifi.properties), NiFi begins swapping out FlowFile attributes to disk. One must remember that this swapping is per connection. This swapping is not based on any heap usage but rather by object thresholds so values may need to be adjusted based on average FlowFile Attribute size.
Thanks @Matt, thats a big help! It aligns with my understanding although I didn't know about the attributes. I currently have:
With respect to point 4, this would only come into affect when the specific processor is running correct? If all relevant split/merge processors were stopped then this shouldn't have an effect.
I can only imagine its a leak somewhere, I can't see any other reason why the heap would have grown to that size.
If I was to turn off all processors, empty all queues and the memory still didn't drop, would this indicate a leak?
There are other things that also exist in heap memory space within the NiFi JVM:
Component Status History:
NiFi will store status history data points for all processors on the NiFi canvas (including those that are stopped). You can see this stored status history by right clicking on a component and selecting "view status history". Each component has numerous stats for which these data points are retained. All these component status points are stored in heap memory. The number of points per each stat that is held in heap is controlled in the nifi.properties file:
nifi.components.status.repository.buffer.size --> Specifies the buffer size for the Component Status Repository. The default value is 1440.
nifi.components.status.snapshot.frequency --> This value indicates how often to present a snapshot of the components' status history. The default value is 1 min.
so on every restart of NiFi these stats are gone since they are in heap only. Then over the course of default 24 hours (1440 minutes in 24 hours) the heap usage grows. You can reduce the heap usage by this status history by adjusting the above properties. (take snapshots less frequently, perhaps every 5 minutes. Reduce number of data points retained from 1440 to 380 or lower.)
All uploaded templates (whether they are instantiated to canvas or not) are held in heap memory. you can reduce heap memory usage by deleting uploaded templates you will no longer be instantiating to canvas.
Your entire flow is held in heap memory. The more components you have on the canvas the large heap footprint.
Even if no processors are running, the FlowFile attributes for FlowFiles loaded in to each connection between processors are held in heap memory. (There is a swap threshold configurable in nifi.properties file which triggers a connection to start swapping to disk if number fo queued FlowFiles exceeds the configured swap threshold)