Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to put files in flume spooldir one by one

Solved Go to solution

How to put files in flume spooldir one by one

Explorer

i am using flume spooldir to put files in HDFS , but i am getting so many small files in HDFS. I thought of using batch size and roll interval but i don't want to get dependent on size and interval. So I decided to push files in flume spooldir one at a time. How can i do this ? Please help

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to put files in flume spooldir one by one

Champion

You can refer Hdfs sink timestamp escape sequence , there is alot of them you can use accordingly . 

 

example 

U can use hdfs bucketing , for every one hour. 

agen1.sinks.hdfsSinks.hdfs.path = /data/flume/%{aa}/%y/%m/%d/%H/%M
agent1.sinks.hdfsSinks.hdfs.round = true
agen1.sinks.hdfsSinks.roundUnit = hour
agen1.sinks.hdfsSinks.roundValue = 1 
3 REPLIES 3

Re: How to put files in flume spooldir one by one

Champion

Try using the timestamp interceptor. 

Highlighted

Re: How to put files in flume spooldir one by one

Explorer
any examples you have ?

Re: How to put files in flume spooldir one by one

Champion

You can refer Hdfs sink timestamp escape sequence , there is alot of them you can use accordingly . 

 

example 

U can use hdfs bucketing , for every one hour. 

agen1.sinks.hdfsSinks.hdfs.path = /data/flume/%{aa}/%y/%m/%d/%H/%M
agent1.sinks.hdfsSinks.hdfs.round = true
agen1.sinks.hdfsSinks.roundUnit = hour
agen1.sinks.hdfsSinks.roundValue = 1