Created 02-19-2019 05:01 PM
We have a setup where flume reads from kafka and writes into hdfs.
We have a requirement where we want flume to read data from kafka for only a particular day - so we want to filter data by timestamp.
So for example : we want to read data only for February 17th 2019.
The data has within it the date in the format : "2019-02-17"
I tried the below in the my flume configuration but it did not work :
flume_agent.sources.kafka1.interceptors = regex
flume_agent.sources.kafka1.interceptors.regex.type = regex_filter
flume_agent.sources.kafka1.interceptors.regex.regex = 2019-02-17
flume_flat_agent.sources.kafka1.interceptors.regex.includeEvents = true
Appreciate any insights as to how I can achieve this.