Support Questions
Find answers, ask questions, and share your expertise

Regarding MergeContent Processor.

Explorer

I have a flow where I have to connect with remote server by GetSftp processor and get all files from specified folder and then make single zip of all files.

As flow will generate individual flowfiles for each files, and transfer this to MergeContent processor, so it will not making zip of all the files. As flowfiles are tansferred in regular interval, so final zip I am getting contains only few files.

Is there is any solution for this, so that final zip contain all file??

Thanks in advance..

3 REPLIES 3

Re: Regarding MergeContent Processor.

Expert Contributor
@Gaurav Jain

If you know the number of files upfront in sftp, you can set "min number of Entries"(Files) before a bin is marked complete and forwarded to next processor.

Re: Regarding MergeContent Processor.

Explorer

I don't know files upfront in sftp in advance.

Re: Regarding MergeContent Processor.

Master Guru

@Gaurav Jain

Since NiFi has no way of knowing how many files are in the source SFTP server directory, it makes it difficult to guarantee that all files will make it into a single zip file.

If some latency is acceptable, you could try the following mergeContent configuration:

- Set "Minimum Number of Entries" to some value you know is larger then the number of Files that will be returned. So if know the number of files will range between 100 and 1000, set this to 2000 for example.

- Set "Maximum Number of Entries" to some value large then the above value.

- Set "Max Bin Age" to a value that allows sufficient time for NiFi to get all files from SFTP server. This is where the latency comes in. With this set you are putting a hold on the Merge long enough to ensure all files have been consumed from the SFTP server. The idea is trigger a merge based on bin age rather then number of entries.

As you can see, this still does not guarantee your zip will contain all files if:

  1. GetSFTP takes longer to ingest all files then the configured "Max bin age".
  2. Network outage causes connection to be lost to SFTP server after only some files are ingested.

However, this may get you closer to your desired outcome and will work under normal operational conditions.

Thanks,

Matt