- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to recover or clean a corrupt NiFi FlowFile and/or Provenance Repository?
- Labels:
-
Apache NiFi
Created ‎07-18-2017 04:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
We are experiencing a problem with an apparently large queue that can't be processed through a MergeContent processor.
Our problem looks very similar to the one described on the forum here:
and also here: https://issues.apache.org/jira/browse/NIFI-3329
Our logs are full of:
FlowFileHandlingException: StandardFlowFileRecord[uuid=<UUID> ... ] is not the most recent version of this FlowFile within this session
and
FlowFileHandlingException: StandardFlowFileRecord[uuid=<UUID> ... ]... is not known in this session
It seems, to me, as has been suggested elsewhere, that there may be some kind of corruption of our repositories. The message appear to indicate that there are inconsistencies in the repositories or database.
I was wondering how we could repair this situation. I can't find any tools that could allow me to analyse/verify the flowfile or provenance repositories.
Furtherrmore, if we can't repair this situation how would we go about safely cleaning out the repository?
So my questions are:
1) Are there any tools for analysing, verifying, or repairing NiFi repositories on disk?
2) What would be the process of clearing out the repositories on disk if we wanted to start with a clean slate?
Thanks
Richard
Created ‎07-18-2017 06:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Question 1, there is currently no tool to do what you need.
Question 2, The process you should follow to clear the repos -- stop NiFi, clean out the flowfile and content repos, then restart NiFi. You should not see any issues in the log file after that.
What is the issue you are seeing with the MergeContent processor? What is the configuration of the MergeContent processor and how many/how large a file are you trying to create?
Created ‎07-19-2017 11:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've heard from other sources that the provenance directory should also be clean out. Do you think that's necessary or not?
Created ‎07-26-2017 02:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Provenance repo has not impact on the functionality of your dataflow. All the FlowFiles currently queued in your dataflow are directly tied to the content in the FlowFile and Content repositories.
The data stored in your provenance repository has a configured lifespan (default 24 hours or 1 GB disk usage) and should be cleared automatically based on those threshold by NiFi.
Created ‎11-14-2017 09:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Matt,
Does Clearing flowfile & content repositories clears up state of the processors too?
Created ‎07-19-2017 09:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Wynner, thanks for answers. It would be really good if there was some kind of tool that could at least dump out the state of the repositories so that we can try to understand more about what's going on. Something for the future, perhaps.
I'm not sure if the issue is with the MergeContent processor exactly. I certainly don't understand why the repo has entered this state where FlowFiles either cannot be found or appear to be stale. The problem is that even with the processor running the queue does not get processed because the flowfiles for the given IDs can't be found or are stale.
We have min/max group size 64MB-256MB, min # of entries: 1, max # of entries: 10000, Max Bin Age: 1 Mins, max # of Bins: 100, Delimiter strategy: Text, Attribute strategy: Keep All Unique Attributes. Run Schedule: 0 sec
