Support Questions

Find answers, ask questions, and share your expertise

Nifi, cleaning flow files after event processing

avatar
Expert Contributor

Is there a way to clean up flow files and keep only the attributes after event processing is completed. Need to remove Flow files for security reasons.

Any suggestions if we can delete or any other way of handling the above use case.

1 ACCEPTED SOLUTION

avatar
Master Mentor
@nyakkanti

FlowFiles consist of FlowFile attributes and FlowFile content.

- FlowFile attributes are kept in heap during processing and persisted to the FlowFile repository.

- FlowFile content is kept in claims within the content repository.

A claim is moved is moved to archive once their no longer exists any FlowFiles still active anywhere in your dataflow pointing at it. Archiving is enabled by default but can be disabled in the nifi.properties file:

nifi.content.repository.archive.enabled=true

If you disable archiving, the claim is purged from NiFi's content repository rather the being archived.

What is important to understand is how claims work. By default in the nifi.properties file, claims can contain up to 100 FlowFiles or a min 10 MB of data (whichever occurs first). So a claim will not be purged until every piece of content in that claim has completed processing. As long as just one piece of content in that claim is still referenced, the entire claim will still exist in the content repository.

As far as FlowFile attributes are concerned, they are persisted in NiFi provenance based on the configured retention in the nifi.properties file. You can perform provenance searches within NiFi to return FlowFile history and look at the attributes of those FlowFiles at any point int their lineage.

Thanks,

Matt

View solution in original post

1 REPLY 1

avatar
Master Mentor
@nyakkanti

FlowFiles consist of FlowFile attributes and FlowFile content.

- FlowFile attributes are kept in heap during processing and persisted to the FlowFile repository.

- FlowFile content is kept in claims within the content repository.

A claim is moved is moved to archive once their no longer exists any FlowFiles still active anywhere in your dataflow pointing at it. Archiving is enabled by default but can be disabled in the nifi.properties file:

nifi.content.repository.archive.enabled=true

If you disable archiving, the claim is purged from NiFi's content repository rather the being archived.

What is important to understand is how claims work. By default in the nifi.properties file, claims can contain up to 100 FlowFiles or a min 10 MB of data (whichever occurs first). So a claim will not be purged until every piece of content in that claim has completed processing. As long as just one piece of content in that claim is still referenced, the entire claim will still exist in the content repository.

As far as FlowFile attributes are concerned, they are persisted in NiFi provenance based on the configured retention in the nifi.properties file. You can perform provenance searches within NiFi to return FlowFile history and look at the attributes of those FlowFiles at any point int their lineage.

Thanks,

Matt