Created on 08-22-2016 03:02 AM - edited 09-16-2022 03:35 AM
i am using flume spooldir to put files in HDFS , but i am getting so many small files in HDFS. I thought of using batch size and roll interval but i don't want to get dependent on size and interval. So I decided to push files in flume spooldir one at a time. How can i do this ? Please help
Created 09-01-2016 12:05 AM
You can refer Hdfs sink timestamp escape sequence , there is alot of them you can use accordingly .
example
U can use hdfs bucketing , for every one hour.
agen1.sinks.hdfsSinks.hdfs.path = /data/flume/%{aa}/%y/%m/%d/%H/%M agent1.sinks.hdfsSinks.hdfs.round = true agen1.sinks.hdfsSinks.roundUnit = hour agen1.sinks.hdfsSinks.roundValue = 1
Created 08-31-2016 11:54 PM
Try using the timestamp interceptor.
Created 08-31-2016 11:57 PM
Created 09-01-2016 12:05 AM
You can refer Hdfs sink timestamp escape sequence , there is alot of them you can use accordingly .
example
U can use hdfs bucketing , for every one hour.
agen1.sinks.hdfsSinks.hdfs.path = /data/flume/%{aa}/%y/%m/%d/%H/%M agent1.sinks.hdfsSinks.hdfs.round = true agen1.sinks.hdfsSinks.roundUnit = hour agen1.sinks.hdfsSinks.roundValue = 1