We are consuming from Kafka using HDF NiFi in a docker container. We current have this flow attached. The problem is that we wish to bundle up until a max age of 1 hour so that 1 file gets pushed to HDFS. We made this doable by setting min/max files/size of the bin very high, and letting Max Bin Age trigger the bundle completion.
As many of you know, HDFS hates small files, especially frequently. The issue with MergeConent, is it holds this in-memory until the bundle is made and pushed to HDFS. If there was a way that the bundle could be built on-disk, that seems ok. Design alternatives? We are thinking of just making a Java application to hanlde Kafka -> HDFS if this fails.