Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ingest Large Data Files

Solved Go to solution

Ingest Large Data Files

New Contributor

I am quite new to nifi. We want to download a csv file from an FtpServer and insert in into HDFS.

I have got this working with a small test file by using the getSFTP and putHDFS processors. We now want to work with our first actual data file of around 15Gb, and later date files of possibly in the 200-500 GB range. Are there any restrictions on the size of the files that can be ingested, and if so, how can these be overcome ?

Thanks very much in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Ingest Large Data Files

Master Guru

@pxm 

 

NiFi sets not restriction on the data size that can be processed.  

Ingested data becomes the content portion of a NiFi FlowFile and is written to the content repository.  The data is not read again unless a processor needs to read the content; otherwise, only the FlowFile attributes/metadata is passed from processor component to another component.  So you need to make sure you have sufficient storage space for the NIFi content_repository.  It is also strongly recommended that this dedicated storage separate from any other NiFi repository.

Beyond that, any limitation here will be on network and disk IO.

 

Thanks,

Matt

1 REPLY 1
Highlighted

Re: Ingest Large Data Files

Master Guru

@pxm 

 

NiFi sets not restriction on the data size that can be processed.  

Ingested data becomes the content portion of a NiFi FlowFile and is written to the content repository.  The data is not read again unless a processor needs to read the content; otherwise, only the FlowFile attributes/metadata is passed from processor component to another component.  So you need to make sure you have sufficient storage space for the NIFi content_repository.  It is also strongly recommended that this dedicated storage separate from any other NiFi repository.

Beyond that, any limitation here will be on network and disk IO.

 

Thanks,

Matt

Don't have an account?
Coming from Hortonworks? Activate your account here