Hi I am using flume to copy the files from spooling directory to HDFS using file as the channel.
#Component names a1.sources = src a1.channels = c1 a1.sinks = k1 #Source details a1.sources.src.type = spooldir a1.sources.src.channels = c1 a1.sources.src.spoolDir = /home/cloudera/onetrail a1.sources.src.fileHeader = false a1.sources.src.basenameHeader = true # a1.sources.src.basenameHeaderKey = basename a1.sources.src.fileSuffix = .COMPLETED a1.sources.src.threads = 4 a1.sources.src.interceptors = newint a1.sources.src.interceptors.newint.type = timestamp #Sink details a1.sinks.k1.type = hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.path = hdfs:///data/contentProviders/cnet/%Y%m%d/ # a1.sinks.k1.hdfs.round = false # a1.sinks.k1.hdfs.roundValue = 1 # a1.sinks.k1.hdfs.roundUnit = second a1.sinks.k1.hdfs.writeFormat = Text a1.sinks.k1.hdfs.fileType = DataStream #a1.sinks.k1.hdfs.file.Type = DataStream a1.sinks.k1.hdfs.filePrefix = %{basename} # a1.sinks.k1.hdfs.fileSuffix = .xml a1.sinks.k1.threadsPoolSize = 4 # use a single file at a time a1.sinks.k1.hdfs.maxOpenFiles = 1 # rollover file based on maximum size of 10 MB a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k1.hdfs.rollInterval = 0 a1.sinks.k1.hdfs.rollSize = 0 a1.sinks.k1.hdfs.batchSize = 12 # Channel details a1.channels.c1.type = file a1.channels.c1.checkpointDir = /tmp/flume/checkpoint/ a1.channels.c1.dataDirs = /tmp/flume/data/ # Bind the source and sink to the channel a1.sources.src.channels = c1 a1.sinks.k1.channels = c1
with the above configuration it is able to copy the files to hdfs but the problem which i am facing is one file is keep staying as .tmp and not copying the complete file content.
Can some one help me what could be the problem.