New Contributor
Posts: 4
Registered: ‎07-27-2018

Flume - How does hdfs sink flush data to disk?

[ Edited ]


I'm setting up Flume to take data from kafka and write to hdfs, using hdfs sink.

This is the sink's conf:

tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /tmp/flume/%y-%m-%d
tier1.sinks.sink1.hdfs.rollInterval = 3600
tier1.sinks.sink1.hdfs.rollSize = 0
tier1.sinks.sink1.hdfs.rollCount = 0
tier1.sinks.sink1.hdfs.round = true
tier1.sinks.sink1.hdfs.roundvalue = 60
tier1.sinks.sink1.hdfs.roundUnit = minutes
tier1.sinks.sink1.hdfs.useLocalTimeStamp = true
tier1.sinks.sink1.hdfs.filePrefix = %H

And the relevant kafka topic has 4 partitions.


I expected this to do one of th following:

1. create 4 .tmp files and keep writing to them until the defined rolling policy is applied. 

2. create 4 .tmp files and buffer some data somewhere untill the defined rolling policy is applied and then weire th data to the files.


I can't explain the behaviour that I see.

4 .tmp files are created. Some very small data is wrtiten to them (around 100k each file). and that it for thr whole hours.

When an hour is passed and the rolling policy is applied, then all the daa is written to them and files are changed to be without .tmp.


Can some one please explain this behaviour?




Posts: 1,697
Kudos: 341
Solutions: 264
Registered: ‎07-31-2013

Re: Flume - How does hdfs sink flush data to disk?

Since your question revolves around attempting to read the files when writing it, the HDFS reading-while-writing (tail) semantics are explained at (section 8.3.1, especially the notes about the hflush operation) and is worth going over.

In Flume's writer model, all events read from the channel are immediately serialized and placed into the configured output writers. When this writer is a HDFS type, this would mean all data goes to the DataNode replica pipeline in chunks of 64k or 128k (io.file.buffer.size config) local buffer flushes. These bytes will be in the HDFS file but they will not be normally visible until the file is closed or the next block is created (at the configured dfs.blocksize boundary).

Occasionally, Flume will also call the HDFS writer's hflush() call, whose function is to update the readable length of the file to whatever is the currently written bytes. Flume does this when the # of events touch the configured batch size (default: 100 events).

So you could expect that if you run a 'hdfs fs -tail -f' on a Flume-written open file, you will see data come out in broken periods of 100 or so events each, instead of freely flowing like you would expect on a local filesystem with a 4 KiB block/page size. This would also mean that if you just had 99 more entries written to the file at its end, and the rolling interval/size has not yet been reached, then you will only see the last written entries when the roll of file is performed (i.e. its closed and renamed to lose the .tmp).

What's the difference in size between the 100 KiB mark (as you've observed it stop at, per file) and the final size when the file is rolled by Flume? Is it enough to account for just upto 100 more new entries?