Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to find the flowfiles which are still holding the current content claim and causing the content repository to increase in size

How to find the flowfiles which are still holding the current content claim and causing the content repository to increase in size

New Contributor

As we know flowfile queued on UI doesn't matches with actual content repo size. Is there a way(REST api?) we can identify which file from the batch is holding the content claim and still queued in flowfile.

1 REPLY 1

Re: How to find the flowfiles which are still holding the current content claim and causing the content repository to increase in size

Master Guru

@neeraj yadav

-

There is no existing NiFi rest-api endpoint that is going to return exactly what you are looking for.

-

That being said, you could add an additional indexed field to your provenance configuration to so that is starts indexing the "ContentClaimIdentifier" associated with each FlowFile event generated. This would give you the ability to use Provenance to all the FlowFiles associated to specific claim in the content repo.

-

To add this simply add "ContentClaimIdentifier" to the list if existing indexed fields via the nifi.properties file. Here is the specific property you will be editing:

-

nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID, Relationship, ContentClaimIdentifier

-

NOTE: a restart of NiFi will need to occur before indexing of this new field will begin. NiFi will not go back and re-index existing events. It will only start adding this new indexed field to events created from this point forward.

-

You can then search your content repo for any content claim files that are of concern (for example searching for any very old or very large claims). Simply copy the claim number (filename of claim) and search for it via a provenance query. Once you have your list of FlowFile events all tied to same claim you would need to look at lineage of each of those FlowFiles to see if any of them made it to a DROP event (A DROP event means it is no longer in your dataflows anywhere).

-

I created the following article to better lay out the details of the above process:

https://community.hortonworks.com/articles/227048/how-to-determine-which-flowfiles-are-associated-to...

-

Thank you,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.