Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Move files from a spooling directory to HDFS with flume

Solved Go to solution

Move files from a spooling directory to HDFS with flume

Explorer
 

Hi I am using flume to copy the files from spooling directory to HDFS using file as the channel.

#Component names
a1.sources = src
a1.channels = c1
a1.sinks = k1

#Source details
a1.sources.src.type = spooldir
a1.sources.src.channels = c1
a1.sources.src.spoolDir = /home/cloudera/onetrail
a1.sources.src.fileHeader = false
a1.sources.src.basenameHeader = true
# a1.sources.src.basenameHeaderKey = basename
a1.sources.src.fileSuffix = .COMPLETED
a1.sources.src.threads = 4
a1.sources.src.interceptors = newint
a1.sources.src.interceptors.newint.type = timestamp

#Sink details
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = hdfs:///data/contentProviders/cnet/%Y%m%d/
# a1.sinks.k1.hdfs.round = false
# a1.sinks.k1.hdfs.roundValue = 1
# a1.sinks.k1.hdfs.roundUnit = second
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.fileType = DataStream
#a1.sinks.k1.hdfs.file.Type = DataStream
a1.sinks.k1.hdfs.filePrefix = %{basename}
# a1.sinks.k1.hdfs.fileSuffix = .xml
a1.sinks.k1.threadsPoolSize = 4

# use a single file at a time
a1.sinks.k1.hdfs.maxOpenFiles = 1

# rollover file based on maximum size of 10 MB
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.batchSize = 12

# Channel details
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /tmp/flume/checkpoint/
a1.channels.c1.dataDirs = /tmp/flume/data/

# Bind the source and sink to the channel
a1.sources.src.channels = c1
a1.sinks.k1.channels = c1

with the above configuration it is able to copy the files to hdfs but the problem which i am facing is one file is keep staying as .tmp and not copying the complete file content.

Can some one help me what could be the problem.

 
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Move files from a spooling directory to HDFS with flume

Super Collaborator
You are specified that all roll values are zero:
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0

Which means the latest file will never roll (since you have hdfs.maxOpenFiles=1). I'd suggest adding the hdfs.idleTimeout if you want to make sure they roll after the file has been ingested and sent to hdfs.

-pd
2 REPLIES 2

Re: Move files from a spooling directory to HDFS with flume

Super Collaborator
You are specified that all roll values are zero:
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 0

Which means the latest file will never roll (since you have hdfs.maxOpenFiles=1). I'd suggest adding the hdfs.idleTimeout if you want to make sure they roll after the file has been ingested and sent to hdfs.

-pd

Re: Move files from a spooling directory to HDFS with flume

New Contributor

Please Explain how to do the data transfer from Local file system to Hdfs using Taildir Flume Source. My use case is to deal with real time data  so data in the source directory is keep updating

Don't have an account?
Coming from Hortonworks? Activate your account here