Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to recover or clean a corrupt NiFi FlowFile and/or Provenance Repository?

avatar

Hi

We are experiencing a problem with an apparently large queue that can't be processed through a MergeContent processor.

Our problem looks very similar to the one described on the forum here:

https://community.hortonworks.com/questions/91051/nifi-error-not-the-most-recent-version-of-this-flo...

and also here: https://issues.apache.org/jira/browse/NIFI-3329

Our logs are full of:

FlowFileHandlingException: StandardFlowFileRecord[uuid=<UUID> ... ] is not the most recent version of this FlowFile within this session

and

FlowFileHandlingException: StandardFlowFileRecord[uuid=<UUID> ... ]... is not known in this session

It seems, to me, as has been suggested elsewhere, that there may be some kind of corruption of our repositories. The message appear to indicate that there are inconsistencies in the repositories or database.

I was wondering how we could repair this situation. I can't find any tools that could allow me to analyse/verify the flowfile or provenance repositories.

Furtherrmore, if we can't repair this situation how would we go about safely cleaning out the repository?

So my questions are:

1) Are there any tools for analysing, verifying, or repairing NiFi repositories on disk?

2) What would be the process of clearing out the repositories on disk if we wanted to start with a clean slate?

Thanks

Richard

5 REPLIES 5

avatar
@Richard Corfiel

Question 1, there is currently no tool to do what you need.

Question 2, The process you should follow to clear the repos -- stop NiFi, clean out the flowfile and content repos, then restart NiFi. You should not see any issues in the log file after that.

What is the issue you are seeing with the MergeContent processor? What is the configuration of the MergeContent processor and how many/how large a file are you trying to create?

avatar

I've heard from other sources that the provenance directory should also be clean out. Do you think that's necessary or not?

avatar
Super Mentor

@Richard Corfield

The Provenance repo has not impact on the functionality of your dataflow. All the FlowFiles currently queued in your dataflow are directly tied to the content in the FlowFile and Content repositories.

The data stored in your provenance repository has a configured lifespan (default 24 hours or 1 GB disk usage) and should be cleared automatically based on those threshold by NiFi.

avatar
Rising Star

Hi Matt,

Does Clearing flowfile & content repositories clears up state of the processors too?

avatar

Hi @Wynner, thanks for answers. It would be really good if there was some kind of tool that could at least dump out the state of the repositories so that we can try to understand more about what's going on. Something for the future, perhaps.

I'm not sure if the issue is with the MergeContent processor exactly. I certainly don't understand why the repo has entered this state where FlowFiles either cannot be found or appear to be stale. The problem is that even with the processor running the queue does not get processed because the flowfiles for the given IDs can't be found or are stale.

We have min/max group size 64MB-256MB, min # of entries: 1, max # of entries: 10000, Max Bin Age: 1 Mins, max # of Bins: 100, Delimiter strategy: Text, Attribute strategy: Keep All Unique Attributes. Run Schedule: 0 sec