Created 06-22-2017 08:16 PM
Hi All,
I'm running into an issue while trying to merge large number of small files in NiFi; I've about 800K files (350 MB) in the queue at MergeContent processor; I'm waiting to accumulate about 1.2 mil files, to merge them into 1 large file; but the MergeContent processor is throwing this error below;
MergeContent[id=3104122b-1077-115c-2e71-b264709ceb44] Failed to process bundle of 897788 files due to org.apache.nifi.processor.exception.FlowFileAccessException: Failed to read content of StandardFlowFileRecord[uuid=a2a32c84-f633-4a7a-8b82-2ba5547db9af,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1498156308912-3769, container=default, section=697], offset=429054, length=436953],offset=104885,name=9b425a01-a759-42b6-bcf6-67f9bc79c871,size=302]; rolling back sessions: org.apache.nifi.processor.exception.FlowFileAccessException: Failed to read content of StandardFlowFileRecord[uuid=a2a32c84-f633-4a7a-8b82-2ba5547db9af,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1498156308912-3769, container=default, section=697], offset=429054, length=436953],offset=104885,name=9b425a01-a759-42b6-bcf6-67f9bc79c871,size=302]
2017-06-22 13:37:49,515 ERROR [NiFi logging handler] org.apache.nifi.StdErr Caused by: java.io.FileNotFoundException: /data1/apache-nifi/content_repository/676/1498156300076-3748 (Too m any open files) 2017-06-22 13:37:49,516 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.io.FileInputStream.open0(Native Method) 2017-06-22 13:37:49,516 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.io.FileInputStream.open(FileInputStream.java:195) 2017-06-22 13:37:49,516 ERROR [NiFi logging handler] org.apache.nifi.StdErr at java.io.FileInputStream.<init>(FileInputStream.java:138)
I'm thinking that it's suggesting that I'm over some kind of threshold;
Would you please let me know which of the content repository properties I should increase, to allow more files to wait in the queue, to be merged.
nifi.properties:
# Content Repository nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository nifi.content.claim.max.appendable.size=10 MB nifi.content.claim.max.flow.files=100 # nifi.content.repository.directory.default=./content_repository nifi.content.repository.directory.default=/data1/apache-nifi/content_repository nifi.content.repository.archive.max.retention.period=12 hours nifi.content.repository.archive.max.usage.percentage=50% nifi.content.repository.archive.enabled=true nifi.content.repository.always.sync=false nifi.content.viewer.url=/nifi-content-viewer/
Created 06-22-2017 09:12 PM
Hi @Raj B,
I'd certainly recommend you to use multiple successive MergeContent processors instead of one. If your trigger is the size: you want to end with a file of 100MB, then I'd use a first MergeContent to merge small files into files of 10MB and then another one to merge into one file of 100MB. That's a typical approach for MergeContent and SplitText processors to avoid such issues.
Hope this helps.
Created 06-22-2017 08:20 PM
Hi @Raj B
This link might hep you: https://community.hortonworks.com/questions/8871/nifi-routetext-processor-too-many-open-files.html
Created 06-23-2017 01:07 PM
@Sonu Sahi thanks; I'm going to try what @Pierre Villard suggested first, before I go this route.
Created 06-22-2017 09:12 PM
Hi @Raj B,
I'd certainly recommend you to use multiple successive MergeContent processors instead of one. If your trigger is the size: you want to end with a file of 100MB, then I'd use a first MergeContent to merge small files into files of 10MB and then another one to merge into one file of 100MB. That's a typical approach for MergeContent and SplitText processors to avoid such issues.
Hope this helps.
Created 06-23-2017 01:08 PM
@Pierre Villard thanks, I'll give it a shot.
Created 06-24-2017 10:46 PM
@Pierre Villard, chaining 2 MergeContent Processors, as you suggested, worked for me; thank you.
Created 03-31-2020 07:16 AM
@pvillard How does this work exactly? Im having issues segmenting large files as well. When i split them do i do it multiple times or just once and then I can recombine them successively. Thanks for you help!
Created 06-23-2017 01:17 PM
increase your ulimit
https://easyengine.io/tutorials/linux/increase-open-files-limit/
Created 06-24-2017 10:45 PM
@tspann, thank you