Created on 11-02-2019 01:47 PM - last edited on 11-02-2019 02:05 PM by ask_bill_brooks
I am quite new to nifi. We want to download a csv file from an FtpServer and insert in into HDFS.
I have got this working with a small test file by using the getSFTP and putHDFS processors. We now want to work with our first actual data file of around 15Gb, and later date files of possibly in the 200-500 GB range. Are there any restrictions on the size of the files that can be ingested, and if so, how can these be overcome ?
Thanks very much in advance
Created 11-04-2019 06:00 AM
NiFi sets not restriction on the data size that can be processed.
Ingested data becomes the content portion of a NiFi FlowFile and is written to the content repository. The data is not read again unless a processor needs to read the content; otherwise, only the FlowFile attributes/metadata is passed from processor component to another component. So you need to make sure you have sufficient storage space for the NIFi content_repository. It is also strongly recommended that this dedicated storage separate from any other NiFi repository.
Beyond that, any limitation here will be on network and disk IO.
Thanks,
Matt
Created 11-04-2019 06:00 AM
NiFi sets not restriction on the data size that can be processed.
Ingested data becomes the content portion of a NiFi FlowFile and is written to the content repository. The data is not read again unless a processor needs to read the content; otherwise, only the FlowFile attributes/metadata is passed from processor component to another component. So you need to make sure you have sufficient storage space for the NIFi content_repository. It is also strongly recommended that this dedicated storage separate from any other NiFi repository.
Beyond that, any limitation here will be on network and disk IO.
Thanks,
Matt