New Contributor
Posts: 1
Registered: ‎04-11-2018

Flume doesn´t delivery flat file into HDFS

[ Edited ]



I have two agents running in different servers. I´ve configured then to send text files from one machine to another with avro. When the file arrives at HDFS, I see one tiny .tmp file. If I close the second agent which saves the file into HDFS, the file is renamed and I see the entire file there.


Could you help me please?


I´m sending both the conf files and the log:


First conf:


Agent1.sources = spooldir-source
Agent1.channels = file-channel
Agent1.sinks = avro-sink

# Describe/configure Source
Agent1.sources.spooldir-source.type = spooldir
Agent1.sources.spooldir-source.spoolDir = /path1
Agent1.sources.spooldir-source.inputCharset = ISO-8859-1
Agent1.sources.spooldir-source.fileSuffix = .OK
Agent1.sources.spooldir-source.decodeErrorPolicy = IGNORE

# Describe the sink
Agent1.sinks.avro-sink.type = avro
Agent1.sinks.avro-sink.hostname =
Agent1.sinks.avro-sink.port = 58000

#Use a channel which buffers events in file
#Agent1.channels.file-channel.type = memory
Agent1.channels.file-channel.type = file
Agent1.channels.file-channel.capacity = 10000
Agent1.channels.file-channel.transactionCapacity = 10000
Agent1.channels.file-channel.write-timeout = 60
Agent1.channels.file-channel.checkpointDir = /pathcp
Agent1.channels.file-channel.dataDirs = /pathdd

# Bind the source and sink to the channel
Agent1.sources.spooldir-source.channels = file-channel = file-channel


Conf 2:


Agent2.sources = avro-source
Agent2.channels = file-channel
Agent2.sinks = hdfs-sink

# Describe/configure Source
Agent2.sources.avro-source.type = avro
Agent2.sources.avro-source.bind =
Agent2.sources.avro-source.port = 58000

# Describe the sink
Agent2.sinks.hdfs-sink.type = hdfs
Agent2.sinks.hdfs-sink.hdfs.path = /pathHdfs
Agent2.sinks.hdfs-sink.hdfs.rollInterval = 0
Agent2.sinks.hdfs-sink.hdfs.rollCount = 0
Agent2.sinks.hdfs-sink.hdfs.fileType = DataStream
Agent2.sinks.hdfs-sink.hdfs.rollSize = 268435456
Agent2.sinks.hdfs-sink.hdfs.batchSize =10000


#Use a channel which buffers events in file
Agent2.channels.file-channel.type = file
Agent2.channels.file-channel.checkpointInterval = 300000
Agent2.channels.file-channel.keep-alive = 1
Agent2.channels.file-channel.checkpointOnClose = true
Agent2.channels.file-channel.checkpointDir = /pathcd
Agent2.channels.file-channel.dataDirs = /pathdd

# Bind the source and sink to the channel
Agent2.sources.avro-source.channels = file-channel = file-channel





18/04/11 11:56:41 INFO hdfs.BucketWriter: Creating / pathHdfs /FlumeData.1523458601607.tmp

^C18/04/11 12:00:15 INFO lifecycle.LifecycleSupervisor: Stopping lifecycle supervisor 10

18/04/11 12:00:15 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider stopping

18/04/11 12:00:15 INFO hdfs.HDFSEventSink: Closing / pathHdfs /FlumeData

18/04/11 12:00:15 INFO hdfs.BucketWriter: Closing /pathHdfs/FlumeData.1523458601607.tmp

18/04/11 12:00:15 INFO hdfs.BucketWriter: Renaming / pathHdfs /FlumeData.1523458601607.tmp to / pathHdfs /FlumeData.1523458601607

Cloudera Employee
Posts: 216
Registered: ‎01-09-2014

Re: Flume doesn´t delivery flat file into HDFS

While a file is open in HDFS, you won't see the true size of the file. Only when the file is closed will you see the correct size of the file reflected. If you cat the file while it is still open, you should see the entier amount of data that has passed through flume in that file. You can also set hdfs.idleTimeout to instruct flume to close the file after it has been idle, or all your lines from your spooldir have been delivered.

New solutions