Support Questions

richard_d_corfi · ‎07-18-2017

Hi

We are experiencing a problem with an apparently large queue that can't be processed through a MergeContent processor.

Our problem looks very similar to the one described on the forum here:

https://community.hortonworks.com/questions/91051/nifi-error-not-the-most-recent-version-of-this-flo...

and also here: https://issues.apache.org/jira/browse/NIFI-3329

Our logs are full of:

FlowFileHandlingException: StandardFlowFileRecord[uuid=<UUID> ... ] is not the most recent version of this FlowFile within this session

and

FlowFileHandlingException: StandardFlowFileRecord[uuid=<UUID> ... ]... is not known in this session

It seems, to me, as has been suggested elsewhere, that there may be some kind of corruption of our repositories. The message appear to indicate that there are inconsistencies in the repositories or database.

I was wondering how we could repair this situation. I can't find any tools that could allow me to analyse/verify the flowfile or provenance repositories.

Furtherrmore, if we can't repair this situation how would we go about safely cleaning out the repository?

So my questions are:

1) Are there any tools for analysing, verifying, or repairing NiFi repositories on disk?

2) What would be the process of clearing out the repositories on disk if we wanted to start with a clean slate?

Thanks

Richard

Wynner · ‎07-18-2017

@Richard Corfiel

Question 1, there is currently no tool to do what you need.

Question 2, The process you should follow to clear the repos -- stop NiFi, clean out the flowfile and content repos, then restart NiFi. You should not see any issues in the log file after that.

What is the issue you are seeing with the MergeContent processor? What is the configuration of the MergeContent processor and how many/how large a file are you trying to create?

richard_d_corfi · ‎07-19-2017

I've heard from other sources that the provenance directory should also be clean out. Do you think that's necessary or not?

MattWho · ‎07-26-2017

@Richard Corfield

The Provenance repo has not impact on the functionality of your dataflow. All the FlowFiles currently queued in your dataflow are directly tied to the content in the FlowFile and Content repositories.

The data stored in your provenance repository has a configured lifespan (default 24 hours or 1 GB disk usage) and should be cleared automatically based on those threshold by NiFi.

srijitachaturve · ‎11-14-2017

Hi Matt,

Does Clearing flowfile & content repositories clears up state of the processors too?

richard_d_corfi · ‎07-19-2017

Hi @Wynner, thanks for answers. It would be really good if there was some kind of tool that could at least dump out the state of the repositories so that we can try to understand more about what's going on. Something for the future, perhaps.

I'm not sure if the issue is with the MergeContent processor exactly. I certainly don't understand why the repo has entered this state where FlowFiles either cannot be found or appear to be stale. The problem is that even with the processor running the queue does not get processed because the flowfiles for the given IDs can't be found or are stale.

We have min/max group size 64MB-256MB, min # of entries: 1, max # of entries: 10000, Max Bin Age: 1 Mins, max # of Bins: 100, Delimiter strategy: Text, Attribute strategy: Keep All Unique Attributes. Run Schedule: 0 sec

Cloudera Community

Support Questions

How to recover or clean a corrupt NiFi FlowFile and/or Provenance Repository?

Data Provenance Storage in Apache NiFi

Flowfile contents are corrupted after upgrading to...

Fetch Provenance data using SiteToSiteProvenanceRe...

NiFi - View data provenance - FlowFile name

HDF 3.4.1 NIFI 1.9 - NIFI Provenance Repository fi...

NiFi JoltTransformJSON 2.0.0 trasform JSON flowfil...

How to process corrupted CSV data with NiFi

Nifi Provenance, Flowfiles and Processor State per...

Provenance Site to Site Reporting - via Apache NiF...

How to process provenance repository (Provenance d...