Hi All,
Right now I am working in a project where we are trying to read tomcat
access log using flume and process those data in saprk and dump those in
DB in proper format. But problem is that tomcat access log file is a
daily rolling file and file name will chang every day. Some thing
like...
localhost_access_log.2017-09-19.txt
localhost_access_log.2017-09-18.txt
localhost_access_log.2017-09-17.txt
and my flume conf file for source section is something like
# Describe/configure the source
flumePullAgent.sources.nc1.type = exec
flumePullAgent.sources.nc1.command = tail -F /tomcatLog/localhost_access_log.2017-09-17.txt
#flumePullAgent.sources.nc1.selector.type = replicating
Which is running tail command on a fixed file name(I used fixed name ,
for testing only). How can I pass the file name as a parameter in flume
conf file.
In fact , If some how I able to pass the file name as parameter ,
then also it will not be a actual solution. say , I start flume today
with some file name (example : "localhost_access_log.2017-09-19.txt"),
tomorrow when I will change the file name
(localhost_access_log.2017-09-19.txt to
localhost_access_log.2017-09-20.txt) some one has to stop the flume and
restart with new file name. In that case it will not be a continues
process, I have to stop / start the flume using cron job or something
like this. Another problem is that I will loss some data(The server we
are working now is high throughput server , 700-800 TPS almost ) every
day during the processing time.(I mean time it will take to generate the
new file name+time to stop flume+time to start flule)
Any one , have idea how to run flume with roll over file name in
production environment? Any help will be highly appreciated...