It looks like you're thinking about the problem in a batch process, rather than a real-time process. In a real-time, streaming flow, there will be no start or end to a unit of work. Things will always be on, and these processors we are discussing are designed to create user-defined chunks of work to segment the real-time flow of data in some way. However what you're thinking about is a very specific unit of work, defined by the set of files you are receiving. You have to understand to some degree the parameters of the work that's being done on those files to window it properly. What is the upper bound of records that you can expect in the largest payload you may get? How long will the largest payload take to process? These answers can give you an idea of how to set those parameters. Additionally in a real-time workflow, you should not have to combine everything back into one file as that is a serial process and will be a bottleneck. I would suggest taking a look at your downstream processes and consider parallelizing them.
... View more