I am curious to know how much risks to use the volatile content repository?
My understanding is:
If there is a node failure/restart,
For data has already been processed/persisted through the flow, no impact on our business or downstreams.
But users cannot view and/or replay content via the provenance UI, since the content are gone due to restart.
For the content of flowfiles are still in the middle of flow during node failure/restart, we can't replay them from where it fails, when the node is back to normal. Instead, we have to fetch the same files from source again, and reprocess them end to end through the flow.
If above is correct, I would say as long as we have source data permanently persisted in somewhere out of NiFi, we can always reprocess it when data in volatile content repository is lost. The only loss is the ability to view/replay them via Provenance UI.
BTW, what happens when content exceeds the maximum size of repository?
Out of memory exception? Auto purged from memory? auto archived in disk?
If I set nifi.content.repository.implementation=org.apache.nifi.controller.repository.VolatileContentRepository
Does that mean below properties are auto-disabled?
Any comments are appropriated.
What happens when content exceeds the maximum size of repository? If you reach the configured memory limit in the Volatile Content Repository, then you will not be able to add any new data to the flow until some memory is freed. The only way you would see an Out Of memory exception would be if you configured the nifi.volatile.content.repository.max.size property to a larger value than the memory on the system.
It does not auto purge from memory and it does not auto archive to disk.
These properties are ignored when using the Volatile Content Repository
Thank you for your answers.
You said that If the configured memory limit is reached, and more content data can't be added to Volatile Content Repository.
But how do we free memory for Volatile Content Repository, if it can't auto clean up?
Since this day will come sooner or later. It could be an issue to make the Volatile Content Repository useless.
Thank you for your clarification.
May I understand the Volatile Content Repository only keeps the content for the flowfiles running in the flow? Once the flowfiles are finished, the contents are wipe out from memory, and can't be found in Repository.
In contrast, the File-based Content repository will keep the contents for flowfiles even out of the flow, based on the retention setting.
May I understand the Volatile Content Repository only keeps the content for the flowfiles running in the flow? Once the flowfiles are finished, the contents are wipe out from memory, and can't be found in Repository. --- Correct
In contrast, the disk-based volatile content repository will keep the contents for flowfiles even out of the flow, based on the retention setting. -- The File System Content Repository archives flow files after they are dropped from the flow based on the two archive properties nifi.content.repository.archive.max.retention.period and nifi.content.repository.archive.max.usage.percentage
When I set Volatile content repository properties:
It shows me "Content Repository out of space " exception, when I process a 20M dataset.
From the code comment, I found "
If no Backup Repository has been specified, when the threshold is exceeded, an IOException will be thrown"
1. How to setup the Backup Repository for volatile content repository?
2. Is my above setting for "nifi.volatile.content.repository.max.size" correct? It seems still using the default 100MB.
i want to use the volatile content repository but I´m always facing IO out of memory exceptions, even though i have enough free memory on the system.
- System 16GB RAM
- JVM heap max 4GB
I have tried to get rid of that by changing the nifi.volatile.content.repository.max.size value to another (100MB, 500MB, 1G, 2G, 3G) but this also doesn´t have any effect. The only thing that changes is that the error occurs later if the max.size is higher. My opinion was that the GC should automatically clean up processed flow files latest when the max.size is reached?
Does anybody have an idea?