Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to put files in flume spooldir one by one

avatar
Explorer

i am using flume spooldir to put files in HDFS , but i am getting so many small files in HDFS. I thought of using batch size and roll interval but i don't want to get dependent on size and interval. So I decided to push files in flume spooldir one at a time. How can i do this ? Please help

1 ACCEPTED SOLUTION

avatar
Champion

You can refer Hdfs sink timestamp escape sequence , there is alot of them you can use accordingly . 

 

example 

U can use hdfs bucketing , for every one hour. 

agen1.sinks.hdfsSinks.hdfs.path = /data/flume/%{aa}/%y/%m/%d/%H/%M
agent1.sinks.hdfsSinks.hdfs.round = true
agen1.sinks.hdfsSinks.roundUnit = hour
agen1.sinks.hdfsSinks.roundValue = 1 

View solution in original post

3 REPLIES 3

avatar
Champion

Try using the timestamp interceptor. 

avatar
Explorer
any examples you have ?

avatar
Champion

You can refer Hdfs sink timestamp escape sequence , there is alot of them you can use accordingly . 

 

example 

U can use hdfs bucketing , for every one hour. 

agen1.sinks.hdfsSinks.hdfs.path = /data/flume/%{aa}/%y/%m/%d/%H/%M
agent1.sinks.hdfsSinks.hdfs.round = true
agen1.sinks.hdfsSinks.roundUnit = hour
agen1.sinks.hdfsSinks.roundValue = 1