About AceWinner

AceWinner · ‎11-24-2016

@Matt I'm not 100% sure swapping was the problem here. Modified all of the flows to avoid getting big queues... and bumped the swap threshold to 40000 and we're still experiencing disk growth + unknown file on reboot... I did notice something weird : some of the flows have their "error" or "failure" sending back to themselves instead of auto-termination. Not sure if this is a good practice or not and that it could contribute to the problem?

AceWinner · ‎11-22-2016

We were definitely swapping. We had a bunch of queue in excess of 40-50K. In all cases, the culprit was a merge processor trying to do too big buckets and waiting for too long. I've modified the flows and stacked 2 merge processor one behind the other (first one has a max of 1000 items, 2nd one does the actual merging to our specific size). I'll monitor the situation and see if the problem occurs again. I'm down to 7-8K flow files (from 450K+) in total.

AceWinner · ‎11-21-2016

Here's the actual error message. We'll have tons of them (more than 100k) during the restart... 2016-11-21 20:41:43,056 INFO [main] o.a.n.c.repository.FileSystemRepository Found unknown file [nifipath]/content_repository/39/1479172392813-1092647 (5845 bytes) in File System Repository; removing file

AceWinner · ‎11-21-2016

Thanks @Matt, Clearing the queues does not seem to help. I'm restating one of the nodes right now, I'll be able to share the exact message when it boots and discovers the files that should not be there - sounds a lot like we're hitting the bug. Is there a timeline for the release of 1.1.0? Reading the mailing lists, it seems to be really close to RC. Thanks Phil

AceWinner · ‎11-21-2016

Hello, First time posting here so sorry if this is in the wrong section / wrong format. First, some background : We started a POC using NiFi 1.0.0. We're using a 3 node cluster with limited ressources (this is a POC...). Each of the node has 16 cores, 32gb of ram and 2 volumes : a raid 1 volume for the OS and a Raid 10 volume on 2.5in splindles. I know this is not a recommended setup but the content repo, the provenance repo, the flow files, everything basically, is on the same raid 10 array. The disks are heavily used right now. Content Repo archiving is disabled. Now here's the thing : every 2-3 days, the disk fills up. Right now, the UI reports that we have, in queue : 450 000 (3.21gb). I would expect to have roughly the same amount of data in the nifi/content_repository folder but it's not the case : On one of the node, the content_repo folder is 73gb. I can't tell how big the 2 others nodes are since the "du -h" operation is still running after 10minutes but using "df", I can estimate around 700-800gb on each. When we restart one of the node, it can take hours while the process cleans the content_repo and spams the log with a bunch of "unknown files" Any ideas / Suggestions? This is running on CentOS 6. Thanks Here's the relevant config section : nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository nifi.content.repository.directory.default=./content_repository nifi.content.repository.archive.max.retention.period=1 hours nifi.content.repository.archive.max.usage.percentage=1% nifi.content.repository.archive.enabled=false nifi.content.repository.always.sync=false

Online	Offline
Last Visited	‎10-21-2013 11:27 AM

Member Since	‎10-18-2013 02:49 PM
Last Visited	‎10-21-2013 11:27 AM
Posts	11

Cloudera Community

Re: NiFi 1.0.0 does not seem to be cleaning up its...

Re: NiFi 1.0.0 does not seem to be cleaning up its...

Re: NiFi 1.0.0 does not seem to be cleaning up its...

Re: NiFi 1.0.0 does not seem to be cleaning up its...

Understanding Content Repository Cleanup and Reten...