Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

NiFi error - Too many open files

avatar
Expert Contributor

Hi All,

I'm running into an issue while trying to merge large number of small files in NiFi; I've about 800K files (350 MB) in the queue at MergeContent processor; I'm waiting to accumulate about 1.2 mil files, to merge them into 1 large file; but the MergeContent processor is throwing this error below;

MergeContent[id=3104122b-1077-115c-2e71-b264709ceb44] Failed to process bundle of 897788 files due to org.apache.nifi.processor.exception.FlowFileAccessException: Failed to read content of StandardFlowFileRecord[uuid=a2a32c84-f633-4a7a-8b82-2ba5547db9af,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1498156308912-3769, container=default, section=697], offset=429054, length=436953],offset=104885,name=9b425a01-a759-42b6-bcf6-67f9bc79c871,size=302]; rolling back sessions: org.apache.nifi.processor.exception.FlowFileAccessException: Failed to read content of StandardFlowFileRecord[uuid=a2a32c84-f633-4a7a-8b82-2ba5547db9af,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1498156308912-3769, container=default, section=697], offset=429054, length=436953],offset=104885,name=9b425a01-a759-42b6-bcf6-67f9bc79c871,size=302]
2017-06-22 13:37:49,515 ERROR [NiFi logging handler] org.apache.nifi.StdErr Caused by: java.io.FileNotFoundException: /data1/apache-nifi/content_repository/676/1498156300076-3748 (Too m
any open files)
2017-06-22 13:37:49,516 ERROR [NiFi logging handler] org.apache.nifi.StdErr 	at java.io.FileInputStream.open0(Native Method)
2017-06-22 13:37:49,516 ERROR [NiFi logging handler] org.apache.nifi.StdErr 	at java.io.FileInputStream.open(FileInputStream.java:195)
2017-06-22 13:37:49,516 ERROR [NiFi logging handler] org.apache.nifi.StdErr 	at java.io.FileInputStream.<init>(FileInputStream.java:138)

I'm thinking that it's suggesting that I'm over some kind of threshold;

Would you please let me know which of the content repository properties I should increase, to allow more files to wait in the queue, to be merged.

nifi.properties:

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=10 MB
nifi.content.claim.max.flow.files=100
# nifi.content.repository.directory.default=./content_repository
nifi.content.repository.directory.default=/data1/apache-nifi/content_repository
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=/nifi-content-viewer/
1 ACCEPTED SOLUTION

avatar

Hi @Raj B,

I'd certainly recommend you to use multiple successive MergeContent processors instead of one. If your trigger is the size: you want to end with a file of 100MB, then I'd use a first MergeContent to merge small files into files of 10MB and then another one to merge into one file of 100MB. That's a typical approach for MergeContent and SplitText processors to avoid such issues.

Hope this helps.

View solution in original post

8 REPLIES 8

avatar
Guru

avatar
Expert Contributor

@Sonu Sahi thanks; I'm going to try what @Pierre Villard suggested first, before I go this route.

avatar

Hi @Raj B,

I'd certainly recommend you to use multiple successive MergeContent processors instead of one. If your trigger is the size: you want to end with a file of 100MB, then I'd use a first MergeContent to merge small files into files of 10MB and then another one to merge into one file of 100MB. That's a typical approach for MergeContent and SplitText processors to avoid such issues.

Hope this helps.

avatar
Expert Contributor

@Pierre Villard thanks, I'll give it a shot.

avatar
Expert Contributor

@Pierre Villard, chaining 2 MergeContent Processors, as you suggested, worked for me; thank you.

avatar
New Member

@pvillard How does this work exactly? Im having issues segmenting large files as well. When i split them do i do it multiple times or just once and then I can recombine them successively. Thanks for you help!

avatar
Master Guru

avatar
Expert Contributor

@tspann, thank you