Support Questions

indranil · ‎08-22-2016

i am using flume spooldir to put files in HDFS , but i am getting so many small files in HDFS. I thought of using batch size and roll interval but i don't want to get dependent on size and interval. So I decided to push files in flume spooldir one at a time. How can i do this ? Please help

csguna · ‎09-01-2016

You can refer Hdfs sink timestamp escape sequence , there is alot of them you can use accordingly .

example

U can use hdfs bucketing , for every one hour.

agen1.sinks.hdfsSinks.hdfs.path = /data/flume/%{aa}/%y/%m/%d/%H/%M
agent1.sinks.hdfsSinks.hdfs.round = true
agen1.sinks.hdfsSinks.roundUnit = hour
agen1.sinks.hdfsSinks.roundValue = 1

View solution in original post

csguna · ‎08-31-2016

Try using the timestamp interceptor.

indranil · ‎08-31-2016

any examples you have ?

csguna · ‎09-01-2016

You can refer Hdfs sink timestamp escape sequence , there is alot of them you can use accordingly .

example

U can use hdfs bucketing , for every one hour.

agen1.sinks.hdfsSinks.hdfs.path = /data/flume/%{aa}/%y/%m/%d/%H/%M
agent1.sinks.hdfsSinks.hdfs.round = true
agen1.sinks.hdfsSinks.roundUnit = hour
agen1.sinks.hdfsSinks.roundValue = 1

Cloudera Community

Support Questions

How to put files in flume spooldir one by one