Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Understanding Content Repository contents

Highlighted

Understanding Content Repository contents

New Contributor

Hi,

We recently had an issue where our Content Repository filled up completely and i was trying to understand what makes up what is stored in the content repository.

From reading articles what is showed at canvas level is not any indication on actual storage use.

So we had an instance where our content repository 100GB filled up, I temporarily alleviated it by clearing an unused Error queue, which allowed other flows to continue to function. One of our flows had around 1.7million in its queue, with each file being made up of 1-10kb.

Now from my understanding of Content Claims and its configuration which we currently have as:

nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100

With the above true I make the assumption that for each Content Claim there would be 100 pieces of content (MAX)

So that would at the most equate to 17.5GB in content claims. Which i can understand when they processed it released this space. What i am struggling to understand is it released around 60/70GB of space and i am unsure what else it is holding?

Are the attributes we were extracting for each file creating a certain amount of storage that is not obviously visible. Or is the content compressed and de-compresses at file level? (As a P.S there were no files in the archive folders i checked that before the space released)

I understand the Content Repository is fairly small but I wanted to understand what makes up the storage of the content repository to understand and manage it int he future once we increase this location.

On a final Note i have just read:

https://community.hortonworks.com/articles/227048/how-to-determine-which-flowfiles-are-associated-to...

Which we will implement to help track large/old Content Claims in the future.

Don't have an account?
Coming from Hortonworks? Activate your account here