Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ingest Large Data Files

avatar
Explorer

I am quite new to nifi. We want to download a csv file from an FtpServer and insert in into HDFS.

I have got this working with a small test file by using the getSFTP and putHDFS processors. We now want to work with our first actual data file of around 15Gb, and later date files of possibly in the 200-500 GB range. Are there any restrictions on the size of the files that can be ingested, and if so, how can these be overcome ?

Thanks very much in advance

1 ACCEPTED SOLUTION

avatar
Super Mentor

@pxm 

 

NiFi sets not restriction on the data size that can be processed.  

Ingested data becomes the content portion of a NiFi FlowFile and is written to the content repository.  The data is not read again unless a processor needs to read the content; otherwise, only the FlowFile attributes/metadata is passed from processor component to another component.  So you need to make sure you have sufficient storage space for the NIFi content_repository.  It is also strongly recommended that this dedicated storage separate from any other NiFi repository.

Beyond that, any limitation here will be on network and disk IO.

 

Thanks,

Matt

View solution in original post

1 REPLY 1

avatar
Super Mentor

@pxm 

 

NiFi sets not restriction on the data size that can be processed.  

Ingested data becomes the content portion of a NiFi FlowFile and is written to the content repository.  The data is not read again unless a processor needs to read the content; otherwise, only the FlowFile attributes/metadata is passed from processor component to another component.  So you need to make sure you have sufficient storage space for the NIFi content_repository.  It is also strongly recommended that this dedicated storage separate from any other NiFi repository.

Beyond that, any limitation here will be on network and disk IO.

 

Thanks,

Matt