how to increase flume file size in hdfs?

Im trying to get my flume to make bigger data when putting twitter data to hdfs. currently there are a lot of 1mb files, but i want less 64mb.

this is my configuration:

TwitterAgent.sources = twitter
TwitterAgent.channels = memoryChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.twitter.consumerKey = x
TwitterAgent.sources.twitter.consumerSecret = x
TwitterAgent.sources.twitter.accessToken =  x-x
TwitterAgent.sources.twitter.accessTokenSecret = x
TwitterAgent.sources.twitter.keywords = wm2014
TwitterAgent.sources.twitter.maxBatchDurationMillis = 200 
TwitterAgent.sources.twitter.channels = memoryChannel
TwitterAgent.channels.memoryChannel.type = memory
TwitterAgent.channels.memoryChannel.capacity = 10000
TwitterAgent.channels.memoryChannel.transactionCapacity = 10000
TwitterAgent.sinks.HDFS.type = hdfs = memoryChannel
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 66584576
TwitterAgent.sinks.HDFS.hdfs.rollCount = 0
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
and why does the keywords line not work? im getting all tweets, not just the keyworded..

Try this:


delete: TwitterAgent.sources.twitter.maxBatchDurationMillis = 200 


and put: TwitterAgent.sinks.HDFS.hdfs.batchSize = 64000


The duration is the time given to write to the hdfs, so 200 milli is too short, just don't put restriction on it.


Lefevre Kevin