Support Questions

Find answers, ask questions, and share your expertise

File Name and Variable in Flume


Hi All,

Right now I am working in a project where we are trying to read tomcat access log using flume and process those data in saprk and dump those in DB in proper format. But problem is that tomcat access log file is a daily rolling file and file name will chang every day. Some thing like...


and my flume conf file for source section is something like

# Describe/configure the source
flumePullAgent.sources.nc1.type = exec
flumePullAgent.sources.nc1.command = tail -F /tomcatLog/localhost_access_log.2017-09-17.txt
#flumePullAgent.sources.nc1.selector.type = replicating

Which is running tail command on a fixed file name(I used fixed name , for testing only). How can I pass the file name as a parameter in flume conf file.

In fact , If some how I able to pass the file name as parameter , then also it will not be a actual solution. say , I start flume today with some file name (example : "localhost_access_log.2017-09-19.txt"), tomorrow when I will change the file name (localhost_access_log.2017-09-19.txt to localhost_access_log.2017-09-20.txt) some one has to stop the flume and restart with new file name. In that case it will not be a continues process, I have to stop / start the flume using cron job or something like this. Another problem is that I will loss some data(The server we are working now is high throughput server , 700-800 TPS almost ) every day during the processing time.(I mean time it will take to generate the new file name+time to stop flume+time to start flule)

Any one , have idea how to run flume with roll over file name in production environment? Any help will be highly appreciated...


Super Guru

@Biswajit Chakraborty

You might be able to use taildir source.

Curious why are you using Flume instead of Nifi? Flume has been deprecated and you can do what you are trying to do using Nifi easily with simple drag and drop interface, while monitoring as your data flows plus ability to throttle, prioritize as well as see data lineage and lot more.