Support Questions

Find answers, ask questions, and share your expertise

Age Off Duration

avatar
New Contributor

Good day,

There are NiFi processors that store state in memory, one example being the DetectDuplicate processor, which "remembers" FlowFiles based on a specific attribute. The duration for which this memory is retained is determined by the Age Off Duration parameter. I would like to know the default value for this parameter and whether it is possible to configure it to retain the memory indefinitely (i.e., with no expiration). Currently, I am using a value of '365000 days'.

Additionally, I would like to know if there is a way to preserve the detected values even after a NiFi restart.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@jirungaray 

Welcome to the community.

The DetectDuplicate processor does not store anything in NiFi state providers (local state directory or cluster state in zookeeper).  The DetectDuplicate processor utilizes a DistributedMapCache Service to store cached items.  Depending on the cache service used, those cache service may offer retention configurations for number of cache entries and cache entry persistence.

Any NiFi component that retains state will indicate such in its documentation under the "State Management" section.

MattWho_0-1740603046354.png


The "Age Off Duration" configuration will age off cache entries that may still exist when that duration is reached, but it can not control the number of cache entries the end service will retain.  So the cache service may still be evicting cache entries prior to that configured Age of Duration is reached.

Since you mention that your Cache Entries are not being preserved on NiFi restart, I assume you have configured your DetectDuplicate to use the DistributedMapCacheClientService.    The DistributedMapCacheClientService is dependent on the existence of a running DistributedMapCacheServer.  This DistributedMapCacheServer does in fact hold cache entries within NiFi's Heap memory and unless you have configured a "Persistence Directory", will lose all cache entries on NiFi service stop.  The DistributedMapCacheServer also has configuration thresholds for the max number of cache entries it will hod before evicting cache entires based on the eviction strategy configured.  This configuration established an upper boundary.  Keep in mind the higher the Max Cache Entries setting, the more NiFi heap memory is used which could lead to NiFi experiencing OutOfMemory (OOM) exceptions.  Since it sounds like you want to retain a very large amount of cached entries, I'd recommend against using the NiFi internal DistributedMapCacheClientService considering the high heap memory usage it would require and the high likelihood that will impact your NiFi's stability and performance.

 

NOTE:   The DistributedMapCacheClientService and DistributedMapCacheServer do NOT offer any form of High Availability.   The DistributeMapCacheClientService can only be configured with a single server hostname.  While the DistributedMapCacheServer when started does create a running Cache server on all hosts within the NiFi cluster, the cached entries are not shared or replicated across all of them.  ONLY the cache server hostname configured in the DistributedMapCacheClientService is used.   For HA, you should be using a more robust external to NiFi cache service like Redis.  

 

 

Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@jirungaray 

Welcome to the community.

The DetectDuplicate processor does not store anything in NiFi state providers (local state directory or cluster state in zookeeper).  The DetectDuplicate processor utilizes a DistributedMapCache Service to store cached items.  Depending on the cache service used, those cache service may offer retention configurations for number of cache entries and cache entry persistence.

Any NiFi component that retains state will indicate such in its documentation under the "State Management" section.

MattWho_0-1740603046354.png


The "Age Off Duration" configuration will age off cache entries that may still exist when that duration is reached, but it can not control the number of cache entries the end service will retain.  So the cache service may still be evicting cache entries prior to that configured Age of Duration is reached.

Since you mention that your Cache Entries are not being preserved on NiFi restart, I assume you have configured your DetectDuplicate to use the DistributedMapCacheClientService.    The DistributedMapCacheClientService is dependent on the existence of a running DistributedMapCacheServer.  This DistributedMapCacheServer does in fact hold cache entries within NiFi's Heap memory and unless you have configured a "Persistence Directory", will lose all cache entries on NiFi service stop.  The DistributedMapCacheServer also has configuration thresholds for the max number of cache entries it will hod before evicting cache entires based on the eviction strategy configured.  This configuration established an upper boundary.  Keep in mind the higher the Max Cache Entries setting, the more NiFi heap memory is used which could lead to NiFi experiencing OutOfMemory (OOM) exceptions.  Since it sounds like you want to retain a very large amount of cached entries, I'd recommend against using the NiFi internal DistributedMapCacheClientService considering the high heap memory usage it would require and the high likelihood that will impact your NiFi's stability and performance.

 

NOTE:   The DistributedMapCacheClientService and DistributedMapCacheServer do NOT offer any form of High Availability.   The DistributeMapCacheClientService can only be configured with a single server hostname.  While the DistributedMapCacheServer when started does create a running Cache server on all hosts within the NiFi cluster, the cached entries are not shared or replicated across all of them.  ONLY the cache server hostname configured in the DistributedMapCacheClientService is used.   For HA, you should be using a more robust external to NiFi cache service like Redis.  

 

 

Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

avatar
New Contributor

What do you mean by "Persistence Directory"?

 

avatar
Master Mentor

@jirungaray 

The DistributedMapCacheServer controller service sets up a cache server which will keep all cached objects in NiFi's JVM heap memory.  This cache is lost if the controller service is disabled/re-enabled or if NiFi were to restart unless the "Persistence Directory" is configured.  The persistence directory is some local disk directory where cache entries are persisted in addition to those cache entries also being in Heap memory.  The persistence to disk allows the in memory cache to be reloaded if the cache server is disabled/re-enabled or NiFi is restarted.  I assume this is the cache server you are currently using.

Matt