Member since
10-18-2013
11
Posts
0
Kudos Received
3
Solutions
11-24-2016
10:29 PM
@Matt I'm not 100% sure swapping was the problem here. Modified all of the flows to avoid getting big queues... and bumped the swap threshold to 40000 and we're still experiencing disk growth + unknown file on reboot... I did notice something weird : some of the flows have their "error" or "failure" sending back to themselves instead of auto-termination. Not sure if this is a good practice or not and that it could contribute to the problem?
... View more
11-22-2016
07:42 PM
We were definitely swapping. We had a bunch of queue in excess of 40-50K. In all cases, the culprit was a merge processor trying to do too big buckets and waiting for too long. I've modified the flows and stacked 2 merge processor one behind the other (first one has a max of 1000 items, 2nd one does the actual merging to our specific size). I'll monitor the situation and see if the problem occurs again. I'm down to 7-8K flow files (from 450K+) in total.
... View more
11-21-2016
08:44 PM
Here's the actual error message. We'll have tons of them (more than 100k) during the restart... 2016-11-21 20:41:43,056 INFO [main] o.a.n.c.repository.FileSystemRepository Found unknown file [nifipath]/content_repository/39/1479172392813-1092647 (5845 bytes) in File System Repository; removing file
... View more
11-21-2016
08:33 PM
Thanks @Matt,
Clearing the queues does not seem to help. I'm restating one of the nodes right now, I'll be able to share the exact message when it boots and discovers the files that should not be there - sounds a lot like we're hitting the bug. Is there a timeline for the release of 1.1.0? Reading the mailing lists, it seems to be really close to RC. Thanks Phil
... View more
11-21-2016
07:14 PM
Hello,
First time posting here so sorry if this is in the wrong section / wrong format.
First, some background : We started a POC using NiFi 1.0.0. We're using a 3 node cluster with limited ressources (this is a POC...). Each of the node has 16 cores, 32gb of ram and 2 volumes : a raid 1 volume for the OS and a Raid 10 volume on 2.5in splindles. I know this is not a recommended setup but the content repo, the provenance repo, the flow files, everything basically, is on the same raid 10 array. The disks are heavily used right now. Content Repo archiving is disabled.
Now here's the thing : every 2-3 days, the disk fills up. Right now, the UI reports that we have, in queue : 450 000 (3.21gb). I would expect to have roughly the same amount of data in the nifi/content_repository folder but it's not the case : On one of the node, the content_repo folder is 73gb. I can't tell how big the 2 others nodes are since the "du -h" operation is still running after 10minutes but using "df", I can estimate around 700-800gb on each.
When we restart one of the node, it can take hours while the process cleans the content_repo and spams the log with a bunch of "unknown files"
Any ideas / Suggestions? This is running on CentOS 6.
Thanks
Here's the relevant config section :
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.repository.directory.default=./content_repository
nifi.content.repository.archive.max.retention.period=1 hours
nifi.content.repository.archive.max.usage.percentage=1%
nifi.content.repository.archive.enabled=false
nifi.content.repository.always.sync=false
... View more
Labels:
- Labels:
-
Apache NiFi