Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume doesn´t delivery flat file into HDFS

Flume doesn´t delivery flat file into HDFS

New Contributor



I have two agents running in different servers. I´ve configured then to send text files from one machine to another with avro. When the file arrives at HDFS, I see one tiny .tmp file. If I close the second agent which saves the file into HDFS, the file is renamed and I see the entire file there.


Could you help me please?


I´m sending both the conf files and the log:


First conf:


Agent1.sources = spooldir-source
Agent1.channels = file-channel
Agent1.sinks = avro-sink

# Describe/configure Source
Agent1.sources.spooldir-source.type = spooldir
Agent1.sources.spooldir-source.spoolDir = /path1
Agent1.sources.spooldir-source.inputCharset = ISO-8859-1
Agent1.sources.spooldir-source.fileSuffix = .OK
Agent1.sources.spooldir-source.decodeErrorPolicy = IGNORE

# Describe the sink
Agent1.sinks.avro-sink.type = avro
Agent1.sinks.avro-sink.hostname =
Agent1.sinks.avro-sink.port = 58000

#Use a channel which buffers events in file
#Agent1.channels.file-channel.type = memory
Agent1.channels.file-channel.type = file
Agent1.channels.file-channel.capacity = 10000
Agent1.channels.file-channel.transactionCapacity = 10000
Agent1.channels.file-channel.write-timeout = 60
Agent1.channels.file-channel.checkpointDir = /pathcp
Agent1.channels.file-channel.dataDirs = /pathdd

# Bind the source and sink to the channel
Agent1.sources.spooldir-source.channels = file-channel = file-channel


Conf 2:


Agent2.sources = avro-source
Agent2.channels = file-channel
Agent2.sinks = hdfs-sink

# Describe/configure Source
Agent2.sources.avro-source.type = avro
Agent2.sources.avro-source.bind =
Agent2.sources.avro-source.port = 58000

# Describe the sink
Agent2.sinks.hdfs-sink.type = hdfs
Agent2.sinks.hdfs-sink.hdfs.path = /pathHdfs
Agent2.sinks.hdfs-sink.hdfs.rollInterval = 0
Agent2.sinks.hdfs-sink.hdfs.rollCount = 0
Agent2.sinks.hdfs-sink.hdfs.fileType = DataStream
Agent2.sinks.hdfs-sink.hdfs.rollSize = 268435456
Agent2.sinks.hdfs-sink.hdfs.batchSize =10000


#Use a channel which buffers events in file
Agent2.channels.file-channel.type = file
Agent2.channels.file-channel.checkpointInterval = 300000
Agent2.channels.file-channel.keep-alive = 1
Agent2.channels.file-channel.checkpointOnClose = true
Agent2.channels.file-channel.checkpointDir = /pathcd
Agent2.channels.file-channel.dataDirs = /pathdd

# Bind the source and sink to the channel
Agent2.sources.avro-source.channels = file-channel = file-channel





18/04/11 11:56:41 INFO hdfs.BucketWriter: Creating / pathHdfs /FlumeData.1523458601607.tmp

^C18/04/11 12:00:15 INFO lifecycle.LifecycleSupervisor: Stopping lifecycle supervisor 10

18/04/11 12:00:15 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider stopping

18/04/11 12:00:15 INFO hdfs.HDFSEventSink: Closing / pathHdfs /FlumeData

18/04/11 12:00:15 INFO hdfs.BucketWriter: Closing /pathHdfs/FlumeData.1523458601607.tmp

18/04/11 12:00:15 INFO hdfs.BucketWriter: Renaming / pathHdfs /FlumeData.1523458601607.tmp to / pathHdfs /FlumeData.1523458601607


Re: Flume doesn´t delivery flat file into HDFS

Super Collaborator
While a file is open in HDFS, you won't see the true size of the file. Only when the file is closed will you see the correct size of the file reflected. If you cat the file while it is still open, you should see the entier amount of data that has passed through flume in that file. You can also set hdfs.idleTimeout to instruct flume to close the file after it has been idle, or all your lines from your spooldir have been delivered.

Don't have an account?
Coming from Hortonworks? Activate your account here