Support Questions

Find answers, ask questions, and share your expertise

How TailFile works with multiple files

avatar
Expert Contributor

Hello, we are seeing some behavior and it seems to indicate something but I want to verify with someone who knows how TailFile processor works when tailing multiple files.

Heres our setup...

We have a cluster of two nodes for Nifi. We are tailing a specific log, call it foo.log, located in different versions in a versions folder.

To illustrate... we are tailing these files

/var/foobar/versions/123.1/foo.log

/var/foobar/versions/234.2/foo.log

Now, upon initial run of the TailFile processor, the foo.log in 123.1 is not currently receiving anymore data since now that data is coming into 234.2, the newer version. What we are seeing is that any data being tailed is only coming from 234.2 (which is awesome, and what we want to happen - we feared it would read in the foo.log from 123.1 despite it not receiving anymore data as well as the incoming data form 234.2)

is it the Tailfile's functionality to only tail the files that are receiving data and the ones that aren't it doesnt? This would indicate to me when we do another version and

/var/foobar/versions/332.22/foo.log

appears and data stops going into 234.2 it would stop tailing from 234.2 (makes sense) and start pulling data form 332.22... testing this is proving rather difficult so I was hoping we could get some verification from someone who knows the functionality better.

PS: we have managed to use regex to indicate to grab foo.log from any folder under versions which are composed of decimals and digits so that seems to be working.

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Eric Lloyd,

If TailFile processor is configured to Multiple files as Tailing Mode property and Recursive Lookup property to True then if you configured to Run schedule as 10 sec(not necessarily).

For the first time when it ran on all nodes then it will tails the files available in these directories and stores the state as file time stamp(you can check the state on by right clicking on the processor --> click on view state button).

When this processor runs again after 10sec and checks the files recursively if there is any change in the state of files then it will pulls new files and updates the state in the processor.

Example:- i have test.log file

bash# ll 
-rwxrwxrwx 1 nifi nifi 5 Oct 12 18:43 test.log

if you check the state in nifi

40826-state.png

that means nifi converted the file created time i.e 5 Oct 12 18:43 to unixtimestamp in milliseconds and stored in the processor.

when it runs again, it compares the stored state in the processor value with created time of the file, if these values differ then it tails that file again and updates the state with new file created time stamp. if these values are same then it won't tails the file.

Same way nifi looks recursively in all directories if there is any change in any of the file create time then pulls that file and updates the state.

Now,

Lets take your case if only 234.2/foo.log is updating and 123.1/foo.log not updating, then processor will only fetches 234.2/foo.log file, it wont fetch 123.1/foo.log because it is not updated.

if new directory got created (or) logs got written to new file, it doesn't matter because we are recursively looking for new files that got created after the state stored in the processor and it won't duplicates the files that got fetched before.

NiFi will take care of the new files and new directories that got created newly.

View solution in original post

2 REPLIES 2

avatar
Master Guru

Hi @Eric Lloyd,

If TailFile processor is configured to Multiple files as Tailing Mode property and Recursive Lookup property to True then if you configured to Run schedule as 10 sec(not necessarily).

For the first time when it ran on all nodes then it will tails the files available in these directories and stores the state as file time stamp(you can check the state on by right clicking on the processor --> click on view state button).

When this processor runs again after 10sec and checks the files recursively if there is any change in the state of files then it will pulls new files and updates the state in the processor.

Example:- i have test.log file

bash# ll 
-rwxrwxrwx 1 nifi nifi 5 Oct 12 18:43 test.log

if you check the state in nifi

40826-state.png

that means nifi converted the file created time i.e 5 Oct 12 18:43 to unixtimestamp in milliseconds and stored in the processor.

when it runs again, it compares the stored state in the processor value with created time of the file, if these values differ then it tails that file again and updates the state with new file created time stamp. if these values are same then it won't tails the file.

Same way nifi looks recursively in all directories if there is any change in any of the file create time then pulls that file and updates the state.

Now,

Lets take your case if only 234.2/foo.log is updating and 123.1/foo.log not updating, then processor will only fetches 234.2/foo.log file, it wont fetch 123.1/foo.log because it is not updated.

if new directory got created (or) logs got written to new file, it doesn't matter because we are recursively looking for new files that got created after the state stored in the processor and it won't duplicates the files that got fetched before.

NiFi will take care of the new files and new directories that got created newly.

avatar
Expert Contributor

Thanks that was a great answer.