Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

flume still keep the .tmp file and not copying the file completely to HDFS

Highlighted

flume still keep the .tmp file and not copying the file completely to HDFS

Explorer

Hi I am using flume to copy the files from spooling directory to HDFS using file as the channel.

#Component names
a1.sources = src
a1.channels = c1
a1.sinks = k1

#Source details
a1.sources.src.type = spooldir
a1.sources.src.channels = c1
a1.sources.src.spoolDir = /home/cloudera/onetrail
a1.sources.src.fileHeader = false
a1.sources.src.basenameHeader = true
# a1.sources.src.basenameHeaderKey = basename
a1.sources.src.fileSuffix = .COMPLETED
a1.sources.src.threads = 4
a1.sources.src.interceptors = newint
a1.sources.src.interceptors.newint.type = timestamp

#Sink details
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = hdfs:///data/contentProviders/cnet/%Y%m%d/
# a1.sinks.k1.hdfs.round = false
# a1.sinks.k1.hdfs.roundValue = 1
# a1.sinks.k1.hdfs.roundUnit = second
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
#a1.sinks.k1.hdfs.file.Type = DataStream
a1.sinks.k1.hdfs.filePrefix = %{basename}
# a1.sinks.k1.hdfs.fileSuffix = .xml
a1.sinks.k1.threadsPoolSize = 4

# use a single file at a time
a1.sinks.k1.hdfs.maxOpenFiles = 1

# rollover file based on maximum size of 10 MB
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.batchSize = 12

# Channel details
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /tmp/flume/checkpoint/
a1.channels.c1.dataDirs = /tmp/flume/data/

# Bind the source and sink to the channel
a1.sources.src.channels = c1
a1.sinks.k1.channels = c1
with the above configuration it is able to copy the files to hdfs but the problem which i am facing is one file is keep staying as .tmp and not copying the complete file content.
Can some one help me what could be the problem.

3 REPLIES 3

Re: flume still keep the .tmp file and not copying the file completely to HDFS

New Contributor

I have exactly the same problem :-( Did you find any solution in the meantime?

Re: flume still keep the .tmp file and not copying the file completely to HDFS

Super Guru
@Raghava Rao Annapareddy

In your sink settings, your comment says "#rollover file based on maximum size of 10MB" and yet your a1.sinks.k1.rollSize is set to 0 which means never to rollover based on size. you do have batchsize set to 12 which means flush to hdfs after 12 events which I think you shuld see. Is it possible that what you are seeing in .tmp is less than 12 events that are not copied? The file based on your settings will remain there. Here is what we know from documentation.

hdfs.rollInterval30Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize1024File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount10Number of events written to file before it rolled (0 = never roll based on number of events)
hdfs.idleTimeout0Timeout after which inactive files get closed (0 = disable automatic closing of idle files)
hdfs.batchSize100number of events written to file before it is flushed to HDFS

Notice based on your settings you will flush every 12 events but never close the file.

Re: flume still keep the .tmp file and not copying the file completely to HDFS

New Contributor

Sorry, I overlooked your question. My problem is not exactly the same as yours!