Hi guys, i am using Nifi 1.0 is there any memory leak issue ? i am using around 123 processors and 50Mb * 200 flowfiles, after processing 100 files , all processors starts throwing this exception java.lang.OutOfMemoryError: Java heap space.
I have allocated 4GB ram in bootstrap.conf and my flow works perfectly fine for the first 100 files. Please suggest any optimizations required. Is NIFI built to use this amount of processors and files.
NiFi can certainly handle dataflow with excess of 123 processors and well in excess of the number of FlowFiles you have here. Different processors exhibit different resource (CPU, Memory, and disk I/O) strain on your hardware. In addition to processors having an impact on memory, so do FlowFiles themselves. FlowFiles are a combination of the Physical content (stored in the NiFi content Repository) and FlowFile Attributes (Metadata associated to the content stored in heap memory). You can experience heap memory issues if your FlowFiles have very large attributes maps. (for example extracting the large amounts of content into attributes.) The first step is identifying which processor(s) in your flow are memory intensive resulting in your OutofMemoryError. Processors such as SplitText, SplitXML, and MergeContent can use a lot of heap if they are producing a lot of split files from a single file or merging a large number of files in to a single file. Th reason being is the merging and splitting is happening in memory until resulting FlowFile(s) are committed to the output relationship. There are ways of handling this resource exhaustion via dataflow design. (for example, merging a smaller number of files multiple times (using multiple MergeContent processors) to produce that one large file or splitting files multiple times (using multiple Split processors). Also be mindful of the number of concurrent tasks assigned to these memory intensive processors.
Running with 4 GB of heap is good, but depending on your dataflow, you may find yourself needing 8 GB or more of heap to satisfy the demand created by your dataflow design.
i am not using any of these processors i am using replace text, the main problem is my nifi flow is able to process around 250 files. but after that even if i give 1 file to process it gives OutofMemory error i am using -XX:+UseG1GC in bootstrap.conf. I am thinking as if the old processed files memory is not freed up causing out of memory issue.
Could you please list the processors you have in the flow?
The processors Matt notes can use a decent chunk of memory but it is not really based on original size of the input entry. It is more about the metadata for the individual flowfiles themselves. So a large input file does not necessarily mean a large heap usage. The metadata for the flowfiles is in memory but typically a very small amount of content is ever in memory.
Some processors though do use a lot of memory for one reason or another. We should probably put warnings about them in their docs and on the UI.
Let's look through the list and identify candidates.
Hi most of the time it is Replace Text which is throwing this error, more over the file size is 2.5 mb and there is only 1 file in queue. The processor already processed around 1.5 GB data, but after that not able to process a single file. I am using java 8 and tried it on nifi 0.7 and nifi 1.0