<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How TailFile works with multiple files in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-TailFile-works-with-multiple-files/m-p/219300#M69590</link>
    <description>&lt;P&gt;Thanks that was a great answer.&lt;/P&gt;</description>
    <pubDate>Sat, 14 Oct 2017 04:43:30 GMT</pubDate>
    <dc:creator>elloyd</dc:creator>
    <dc:date>2017-10-14T04:43:30Z</dc:date>
    <item>
      <title>How TailFile works with multiple files</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-TailFile-works-with-multiple-files/m-p/219298#M69588</link>
      <description>&lt;P&gt;Hello, we are seeing some behavior and it seems to indicate something but I want to verify with someone who knows how TailFile processor works when tailing multiple files.&lt;/P&gt;&lt;P&gt;Heres our setup...&lt;/P&gt;&lt;P&gt;We have a cluster of two nodes for Nifi.  We are tailing a specific log, call it foo.log, located in different versions in a versions folder.&lt;/P&gt;&lt;P&gt;To illustrate... we are tailing these files&lt;/P&gt;&lt;P&gt;/var/foobar/versions/123.1/foo.log&lt;/P&gt;&lt;P&gt;/var/foobar/versions/234.2/foo.log&lt;/P&gt;&lt;P&gt;Now, upon initial run of the TailFile processor, the foo.log in 123.1 is not currently receiving anymore data since now that data is coming into 234.2, the newer version.  What we are seeing is that any data being tailed is only coming from 234.2 (which is awesome, and what we want to happen - we feared it would read in the foo.log from 123.1 despite it not receiving anymore data as well as the incoming data form 234.2)&lt;/P&gt;&lt;P&gt;is it the Tailfile's functionality to only tail the files that are receiving data and the ones that aren't it doesnt?  This would indicate to me when we do another version and &lt;/P&gt;&lt;P&gt;/var/foobar/versions/332.22/foo.log &lt;/P&gt;&lt;P&gt;appears and data stops going into 234.2 it would stop tailing from 234.2 (makes sense) and start pulling data form 332.22...  testing this is proving rather difficult so I was hoping we could get some verification from someone who knows the functionality better.&lt;/P&gt;&lt;P&gt;PS: we have managed to use regex to indicate to grab foo.log from any folder under versions which are composed of decimals and digits so that seems to be working.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:23:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-TailFile-works-with-multiple-files/m-p/219298#M69588</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2022-09-16T12:23:45Z</dc:date>
    </item>
    <item>
      <title>Re: How TailFile works with multiple files</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-TailFile-works-with-multiple-files/m-p/219299#M69589</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/15261/elloyd.html" nodeid="15261" target="_blank"&gt;@Eric Lloyd&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;If &lt;STRONG&gt;TailFile&lt;/STRONG&gt; processor is configured to &lt;STRONG&gt;Multiple files&lt;/STRONG&gt; as Tailing Mode property and &lt;STRONG&gt;Recursive Lookup&lt;/STRONG&gt; property to &lt;STRONG&gt;True&lt;/STRONG&gt; then if you configured to Run schedule as &lt;STRONG&gt;10&lt;/STRONG&gt; sec(not necessarily). &lt;/P&gt;&lt;P&gt;For the first time when it ran on &lt;STRONG&gt;all nodes&lt;/STRONG&gt; then it will &lt;STRONG&gt;tails&lt;/STRONG&gt; the &lt;STRONG&gt;files&lt;/STRONG&gt; available in these &lt;STRONG&gt;directories&lt;/STRONG&gt; and stores the &lt;STRONG&gt;state&lt;/STRONG&gt; as file &lt;STRONG&gt;time stamp&lt;/STRONG&gt;(you can &lt;STRONG&gt;check&lt;/STRONG&gt; the state on by &lt;STRONG&gt;right&lt;/STRONG&gt; clicking on the&lt;STRONG&gt; processor&lt;/STRONG&gt; --&amp;gt; click on&lt;STRONG&gt; view state&lt;/STRONG&gt; button).&lt;/P&gt;&lt;P&gt;When this processor runs again after &lt;STRONG&gt;10sec&lt;/STRONG&gt; and checks the &lt;STRONG&gt;files recursively&lt;/STRONG&gt; if there is any &lt;STRONG&gt;change&lt;/STRONG&gt; in the &lt;STRONG&gt;state&lt;/STRONG&gt; of files then it will pulls &lt;STRONG&gt;new files&lt;/STRONG&gt; and updates the &lt;STRONG&gt;state in the processor&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Example:- &lt;/U&gt;&lt;/STRONG&gt;i have test.log file &lt;/P&gt;&lt;PRE&gt;bash# ll 
-rwxrwxrwx 1 nifi nifi 5 Oct 12 18:43 test.log&lt;/PRE&gt;&lt;P&gt;if you check the state in nifi &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="40826-state.png" style="width: 429px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15992iB6D698F4F53369BE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="40826-state.png" alt="40826-state.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;that means &lt;STRONG&gt;nifi converted&lt;/STRONG&gt; the &lt;STRONG&gt;file created time&lt;/STRONG&gt; i.e &lt;STRONG&gt;5 Oct 12 18:43&lt;/STRONG&gt; to &lt;STRONG&gt;unixtimestamp in milliseconds&lt;/STRONG&gt; and &lt;STRONG&gt;stored&lt;/STRONG&gt; in the processor.&lt;/P&gt;&lt;P&gt;when it runs again, it compares the stored state in the processor value with created time of the file, if these values differ then it tails that file again and updates the state with new file created time stamp. if these values are same then it won't tails the file.&lt;/P&gt;&lt;P&gt;Same way &lt;STRONG&gt;nifi&lt;/STRONG&gt; looks recursively in all directories if there is any change in any of the file create time then pulls that file and updates the state. &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Now, &lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Lets take your case if only 234.2/foo.log is updating and 123.1/foo.log not updating, then processor will only &lt;B&gt;fetches 234.2/foo.log file&lt;/B&gt;, it &lt;B&gt;wont &lt;/B&gt;fetch &lt;B&gt;123.1/foo.log&lt;/B&gt; because it is &lt;B&gt;not updated&lt;/B&gt;.&lt;/P&gt;&lt;P&gt;if new &lt;STRONG&gt;directory&lt;/STRONG&gt; got &lt;STRONG&gt;created (or) logs&lt;/STRONG&gt; got written to &lt;STRONG&gt;new file&lt;/STRONG&gt;, it &lt;B&gt;doesn't matter &lt;/B&gt;because we are &lt;B&gt;recursively looking for new files&lt;/B&gt; that got created after the state stored in the processor and it &lt;B&gt;won't duplicates &lt;/B&gt;the files that &lt;B&gt;got fetched before.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;NiFi will take care of the new files and new directories that got created newly.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 02:40:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-TailFile-works-with-multiple-files/m-p/219299#M69589</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2019-08-18T02:40:29Z</dc:date>
    </item>
    <item>
      <title>Re: How TailFile works with multiple files</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-TailFile-works-with-multiple-files/m-p/219300#M69590</link>
      <description>&lt;P&gt;Thanks that was a great answer.&lt;/P&gt;</description>
      <pubDate>Sat, 14 Oct 2017 04:43:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-TailFile-works-with-multiple-files/m-p/219300#M69590</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2017-10-14T04:43:30Z</dc:date>
    </item>
  </channel>
</rss>

