Support Questions
Find answers, ask questions, and share your expertise

Concerns over losing data with HDF Nifi MergeContent in-memory

Concerns over losing data with HDF Nifi MergeContent in-memory

Explorer

47387-kafka-consume-layout.png

We are consuming from Kafka using HDF NiFi in a docker container. We current have this flow attached. The problem is that we wish to bundle up until a max age of 1 hour so that 1 file gets pushed to HDFS. We made this doable by setting min/max files/size of the bin very high, and letting Max Bin Age trigger the bundle completion.

As many of you know, HDFS hates small files, especially frequently. The issue with MergeConent, is it holds this in-memory until the bundle is made and pushed to HDFS. If there was a way that the bundle could be built on-disk, that seems ok. Design alternatives? We are thinking of just making a Java application to hanlde Kafka -> HDFS if this fails

5 REPLIES 5
Highlighted

Re: Concerns over losing data with HDF Nifi MergeContent in-memory

You should not lose any data in this scenario... The state of all flow files is stored in disk in the flow file repository, and the content of those flow files is stored in the content repository on disk. If NiFi were to crash or restart, it would read the flow file repository to restore the 736 flow files to the queue before MergeContent, and then MergeContent would re-bin them waiting for the configured thresholds.

Highlighted

Re: Concerns over losing data with HDF Nifi MergeContent in-memory

Explorer

We have this running in Docker, that is the issue. Because this runs in-memory, that is where we see the issue.

Highlighted

Re: Concerns over losing data with HDF Nifi MergeContent in-memory

Unless you are using the volatile repository implementations, then I don't see how it would matter if you are in Docker or not.

Doesn't your NiFi instance in Docker still have access to some kind of disk?

Highlighted

Re: Concerns over losing data with HDF Nifi MergeContent in-memory

Unless you are using the volatile repository implementations, then I don't see how it would matter if you are in Docker or not.

Doesn't your NiFi instance in Docker still have access to some kind of disk?

Highlighted

Re: Concerns over losing data with HDF Nifi MergeContent in-memory

Super Guru