- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi Memory Usage
- Labels:
-
Apache NiFi
Created ‎03-12-2025 02:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Nifi version I use is 1.23.2. After restarting, memory drops but increases steadily as time goes by.As a result, memory usage reaches 100% day by day.The MonitorMemory output for now is like this:
Jvm memory args is:
# JVM memory settings
java.arg.2=-Xms32G
java.arg.3=-Xmx32G
What is the reason of this situation?Shouldn't memory usage reach a certain level and stop?
Created ‎03-17-2025 11:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@hus
Java garbage collection will not kick in until JVM heap usage reaches ~80% usage. So setting up MonitorMemory reporting task to monitor at 1% is just going to be noisy.
Now you are reporting that your NiFi will eventually use the full 32GB of allocated heap memory. This is most commonly related to one of the following:
- A memory leak in a custom processor added to your Apache NiFi install.
- Creating excessively large FlowFile attributes. NiFi FlowFiles are held in heap memory (FlowFile attributes/metadata held in heap, FlowFile content is NOT held in heap). So be careful if anywhere in your dataflow where you may be extracting large amount of content to FlowFile attributes.
- Leaving FlowFiles accumulated in connections. Since FlowFiles live in heap memory, leaving connections queued with FlowFiles all over your dataflows will consume heap. Don't use NiFi connections to hold your data, I have seen many datflows where failure relationships connect to a stopped component and the queue has grown large.
- Poor dataflow design around memory intensive processor components. The built in usage documentation for each component includes a "resource consideration" section. Just because resource consideration says "MEMORY" does not always mean it will use a lot of memory. Often it depends on how the processor is configured. (examples: MergeContent , SplitText)
- Keeping templates (deprecated) stored NiFi. When you create a NiFi template, the template is written to the flow.json.gz/flow.xml.gz files that also holds everything you see on the canvas. The flow.json.gz/flow.xml.gz is loaded into heap memory. Download yoru templates to store off of NiFi and then delete them from internally being stored within NiFi. Better option is to stop using templates completely. They were deprecated awhile ago and no longer exist as all in the new Apache NiFi 2.x releases.
- Using the build in distributedMapCacheServer controller service to store large amounts of cache entries since controller service stores that in NiFi heap memory.
While Java Garbage Collection will clean-up unused heap memory, it can not clean-up heap that is still in use.
You can startup your NiFi will all components on the canvas stopped by changing the following property in the nifi.properties to false:
nifi.flowcontroller.autoResumeState
This will allow you to see what your heap usage looks like from just loading starting up NiFi without anything running yet. The heap usage would reflect the loading of NiFi (which includes flow.json.gz uncompressed, loading of NIFi nars, and loading of queued FlowFiles). The memory usage should stay relatively flat after all is loaded. Then starts your dataflows one at a time to see how each impacts heap. You could also take java heap sumps to analyze what is using the heap (although 32 GB heap would require at least that much free memory somewhere to look a it, so you may want to decrease your heap allocations while troubleshooting).
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created ‎03-21-2025 02:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all, thank you for your return @MattWho .
I think the reason is distributedMapCacheServer as you said.Because I use distributedMapCacheServer in my stream.
As far as I know, you need to define distributedMapCacheServer to use the distributedMapCacheService.When I look at the features of the two controller services, there are one parameter that catch my eye.DetectDuplicate processors parameters as I shared in the image.
If the Age Off Duration parameter is set to 30 minutes, doesn't it delete the data stored in the distributedMapCacheService after 30 minutes?If not, is there a way to clean-up the distributedMapCacheService?
Created ‎03-24-2025 05:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@hus
There are two controller services you are using for your map cache:
- DistributedMapCacheServer - This controller service when started creates a separate map cache server on every node in a NiFi cluster. These map cache servers do not share cached entries between them. In Apache NiFi 2.x+ "Distributed" has been removed from their name to avoid confusion. The "Max cache Entries" and "Eviction Strategy" control how cached entries are removed from the cache.
- DistributedMapCacheClientService - This Controller Service is used to write data to the specific Map cache server (server hostname). It also has "distributed" removed from its name as of Apache NiFi 2.x.
You are using the DetectDuplicate processor to interact with the above Controller services.
While the DetectDuplicate processor has a configurable "Age Off Duration" setting, ONLY cached entries where both the following conditions have been met will have the cache entry removed at that configured age off:
- At least one duplicate has been detected.
- Age off duration has expired.
So any cached entires for which a duplicate has not yet been detected, that entry will remain in the cache server until the "Max cache Entries" and "Eviction Strategy" settings result in the entry removal.
So depending on what data you are caching, number set for "max cache Entries", and number of duplicates you detect, your cache server likely continues to grow to max and then eviction starts. If you have a "Persistence Directory" configured, the cached data is also being written to that directory so that it is not lost in the event the NiFi instance or DistributedMapCache server is restarted. This also means hat after a NiFi restart the persisted cache is loaded back into heap memory.
Keep in mind that there are other external cache server options that do have HA, are distributed, and would not consume NiFi's heap or memory on the NiFi host if installed on a different server/host.
- RedisDistributedMapCacheClientService
- SimpleRedisDistributedMapCacheClientService
- HazelcastMapCacheClient
- CouchbaseMapCacheClient - Removed as of Apache NiFi 2.x
- HBase_2_ClientMapCacheService - Removed as of Apache NiFi 2.x
- CassandraDistributedMapCache - Removed as of Apache NiFi 2.x
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
