We are using Cloudera 5.16.2 and noticing that flume sink is appending data to existing files in the directory a few hours or sometimes even days after the initial file was closed. I thought flume never appends to an existing file. Could you let us know if there is a configuration parameter to force to write to new files if existing file was already closed by flume and renamed removing the .tmp extension.
We found out about this issue by periodically logging the directory and noticing sudden jump in file sizes (a few hours or a few days later). This doesn't seem to be happening to all files but only some files.
Our source is production logs and the sink is hdfs with snappy compressed files.