Support Questions
Find answers, ask questions, and share your expertise

how to increase flume file size in hdfs?

New Contributor

Hey,

 

Im trying to get my flume to make bigger data when putting twitter data to hdfs. currently there are a lot of 1mb files, but i want less 64mb.

this is my configuration:

TwitterAgent.sources = twitter
TwitterAgent.channels = memoryChannel
TwitterAgent.sinks = HDFS
 
TwitterAgent.sources.twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.twitter.consumerKey = x
TwitterAgent.sources.twitter.consumerSecret = x
TwitterAgent.sources.twitter.accessToken =  x-x
TwitterAgent.sources.twitter.accessTokenSecret = x
TwitterAgent.sources.twitter.keywords = wm2014
TwitterAgent.sources.twitter.maxBatchDurationMillis = 200 
TwitterAgent.sources.twitter.channels = memoryChannel
 
TwitterAgent.channels.memoryChannel.type = memory
TwitterAgent.channels.memoryChannel.capacity = 10000
TwitterAgent.channels.memoryChannel.transactionCapacity = 10000
 
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.channel = memoryChannel
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 66584576
TwitterAgent.sinks.HDFS.hdfs.rollCount = 0
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
 
and why does the keywords line not work? im getting all tweets, not just the keyworded..
 
Thanks!
1 REPLY 1

Re: how to increase flume file size in hdfs?

Explorer

Try this:

 

delete: TwitterAgent.sources.twitter.maxBatchDurationMillis = 200 

 

and put: TwitterAgent.sinks.HDFS.hdfs.batchSize = 64000

 

The duration is the time given to write to the hdfs, so 200 milli is too short, just don't put restriction on it.

 

--
Lefevre Kevin