Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cloudera flume seems to be appending data to existing HDFS files

Cloudera flume seems to be appending data to existing HDFS files

New Contributor

Hi, 

We are using Cloudera 5.16.2 and noticing that flume sink is appending data to existing files in the directory a few hours or sometimes even days after the initial file was closed. I thought flume never appends to an existing file. Could you let us know if there is a configuration parameter to force to write to new files if existing file was already closed by flume and renamed removing the .tmp extension. 

We found out about this issue by periodically logging the directory and noticing sudden jump in file sizes (a few hours or a few days later). This doesn't seem to be happening to all files but only some files. 

Our source is production logs and the sink is hdfs with snappy compressed files.

Thanks,

Leo

Don't have an account?
Coming from Hortonworks? Activate your account here