Im trying to get my flume to make bigger data when putting twitter data to hdfs. currently there are a lot of 1mb files, but i want less 64mb.
this is my configuration:
delete: TwitterAgent.sources.twitter.maxBatchDurationMillis = 200
and put: TwitterAgent.sinks.HDFS.hdfs.batchSize = 64000
The duration is the time given to write to the hdfs, so 200 milli is too short, just don't put restriction on it.