Created on 10-12-2017 09:13 PM - edited 09-16-2022 05:23 AM
Hello, we are seeing some behavior and it seems to indicate something but I want to verify with someone who knows how TailFile processor works when tailing multiple files.
Heres our setup...
We have a cluster of two nodes for Nifi. We are tailing a specific log, call it foo.log, located in different versions in a versions folder.
To illustrate... we are tailing these files
/var/foobar/versions/123.1/foo.log
/var/foobar/versions/234.2/foo.log
Now, upon initial run of the TailFile processor, the foo.log in 123.1 is not currently receiving anymore data since now that data is coming into 234.2, the newer version. What we are seeing is that any data being tailed is only coming from 234.2 (which is awesome, and what we want to happen - we feared it would read in the foo.log from 123.1 despite it not receiving anymore data as well as the incoming data form 234.2)
is it the Tailfile's functionality to only tail the files that are receiving data and the ones that aren't it doesnt? This would indicate to me when we do another version and
/var/foobar/versions/332.22/foo.log
appears and data stops going into 234.2 it would stop tailing from 234.2 (makes sense) and start pulling data form 332.22... testing this is proving rather difficult so I was hoping we could get some verification from someone who knows the functionality better.
PS: we have managed to use regex to indicate to grab foo.log from any folder under versions which are composed of decimals and digits so that seems to be working.
Created on 10-13-2017 12:08 AM - edited 08-17-2019 07:40 PM
Hi @Eric Lloyd,
If TailFile processor is configured to Multiple files as Tailing Mode property and Recursive Lookup property to True then if you configured to Run schedule as 10 sec(not necessarily).
For the first time when it ran on all nodes then it will tails the files available in these directories and stores the state as file time stamp(you can check the state on by right clicking on the processor --> click on view state button).
When this processor runs again after 10sec and checks the files recursively if there is any change in the state of files then it will pulls new files and updates the state in the processor.
Example:- i have test.log file
bash# ll -rwxrwxrwx 1 nifi nifi 5 Oct 12 18:43 test.log
if you check the state in nifi
that means nifi converted the file created time i.e 5 Oct 12 18:43 to unixtimestamp in milliseconds and stored in the processor.
when it runs again, it compares the stored state in the processor value with created time of the file, if these values differ then it tails that file again and updates the state with new file created time stamp. if these values are same then it won't tails the file.
Same way nifi looks recursively in all directories if there is any change in any of the file create time then pulls that file and updates the state.
Now,
Lets take your case if only 234.2/foo.log is updating and 123.1/foo.log not updating, then processor will only fetches 234.2/foo.log file, it wont fetch 123.1/foo.log because it is not updated.
if new directory got created (or) logs got written to new file, it doesn't matter because we are recursively looking for new files that got created after the state stored in the processor and it won't duplicates the files that got fetched before.
NiFi will take care of the new files and new directories that got created newly.
Created on 10-13-2017 12:08 AM - edited 08-17-2019 07:40 PM
Hi @Eric Lloyd,
If TailFile processor is configured to Multiple files as Tailing Mode property and Recursive Lookup property to True then if you configured to Run schedule as 10 sec(not necessarily).
For the first time when it ran on all nodes then it will tails the files available in these directories and stores the state as file time stamp(you can check the state on by right clicking on the processor --> click on view state button).
When this processor runs again after 10sec and checks the files recursively if there is any change in the state of files then it will pulls new files and updates the state in the processor.
Example:- i have test.log file
bash# ll -rwxrwxrwx 1 nifi nifi 5 Oct 12 18:43 test.log
if you check the state in nifi
that means nifi converted the file created time i.e 5 Oct 12 18:43 to unixtimestamp in milliseconds and stored in the processor.
when it runs again, it compares the stored state in the processor value with created time of the file, if these values differ then it tails that file again and updates the state with new file created time stamp. if these values are same then it won't tails the file.
Same way nifi looks recursively in all directories if there is any change in any of the file create time then pulls that file and updates the state.
Now,
Lets take your case if only 234.2/foo.log is updating and 123.1/foo.log not updating, then processor will only fetches 234.2/foo.log file, it wont fetch 123.1/foo.log because it is not updated.
if new directory got created (or) logs got written to new file, it doesn't matter because we are recursively looking for new files that got created after the state stored in the processor and it won't duplicates the files that got fetched before.
NiFi will take care of the new files and new directories that got created newly.
Created 10-13-2017 09:43 PM
Thanks that was a great answer.