Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Content Repository and Archival

Hi, I'm trying to understand how content repository, sizing/capping and archiving are related, reading through the Admin Guide: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html

By default the archive feature is disabled.

1 ACCEPTED SOLUTION

Rising Star

When content in the Content Repository is no longer needed (i.e., no FlowFile references the content), NiFi will do one of two things. If archiving is enabled, it will just mark the content as archived. If archiving is disabled, it will delete the content.

The "nifi.content.repository.archive.max.usage.percentage" property in nifi.properties can be used to control the size of the archive. The default value is 50%. This means that if 50% of the disk where the content repository resides is used up, NiFi will start deleting the oldest content in order to prevent more than 50% usage. Note, this is not the same as indicating that the archive itself can be 50% of the disk space. Rather, it says delete as much of the archived data as needed to stay below 50% disk usage.

The "nifi.content.repository.archive.max.retention.period" property can be used to ensure that data is not archived for more than some time period. For instance, setting this to "2 days" means that any data that is archived for 2 days will be deleted, even if only 1% of the disk space is used up. This is often used for compliance purposes.

View solution in original post

3 REPLIES 3

Rising Star

When content in the Content Repository is no longer needed (i.e., no FlowFile references the content), NiFi will do one of two things. If archiving is enabled, it will just mark the content as archived. If archiving is disabled, it will delete the content.

The "nifi.content.repository.archive.max.usage.percentage" property in nifi.properties can be used to control the size of the archive. The default value is 50%. This means that if 50% of the disk where the content repository resides is used up, NiFi will start deleting the oldest content in order to prevent more than 50% usage. Note, this is not the same as indicating that the archive itself can be 50% of the disk space. Rather, it says delete as much of the archived data as needed to stay below 50% disk usage.

The "nifi.content.repository.archive.max.retention.period" property can be used to ensure that data is not archived for more than some time period. For instance, setting this to "2 days" means that any data that is archived for 2 days will be deleted, even if only 1% of the disk space is used up. This is often used for compliance purposes.

To close the loop with some offline discussions here are a few scenarios to help with the understanding:

  1. Archiving disabled. No more FlowFiles referencing the content (e.g. those processors removed already). Content is deleted (when reaching the overall threshold, to free up disk space). Provenance may still have event metadata (separate retention policies).
  2. Archiving enabled. No more FlowFiles referencing the content. The content will still be available in the archive for the lifetime of the archive.

Explorer

Very old topic, but still valid. Do I understand it right, if we have no provenance enabled, it makes no sense to have the content repo archive enabled? If both is enabled, how can we recover a single flow, via the provenance window?