Support Questions
Find answers, ask questions, and share your expertise

HDF NifI in docker MergeContent processor concerns

HDF NifI in docker MergeContent processor concerns

Explorer

We are consuming from Kafka using HDF NiFi in a docker container. We current have this flow attached. The problem is that we wish to bundle up until a max age of 1 hour so that 1 file gets pushed to HDFS. We made this doable by setting min/max files/size of the bin very high, and letting Max Bin Age trigger the bundle completion.

As many of you know, HDFS hates small files, especially frequently. The issue with MergeConent, is it holds this in-memory until the bundle is made and pushed to HDFS. If there was a way that the bundle could be built on-disk, that seems ok. Design alternatives? We are thinking of just making a Java application to hanlde Kafka -> HDFS if this fails.

47386-kafka-consume-layout.png

1 REPLY 1
Highlighted

Re: HDF NifI in docker MergeContent processor concerns

@Michael DeGuzis

You could do the merge in two steps instead of one. That would reduce the amount of time the files are held in memory as the merged file is created.