<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Nifi:How  does ListHdfs  processor work? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-How-does-ListHdfs-processor-work/m-p/195531#M68142</link>
    <description>&lt;P&gt;In order to have listing start over again, you would need to perform the following:&lt;/P&gt;&lt;P&gt;1. Open "Component State" UI  by right clicking on the listHDFS processor and select "view state".&lt;/P&gt;&lt;P&gt;2. Within that UI you will see a blue link "Clear state" which will clear the currentlr retained state.&lt;/P&gt;</description>
    <pubDate>Sat, 16 Sep 2017 03:39:45 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2017-09-16T03:39:45Z</dc:date>
    <item>
      <title>Nifi:How  does ListHdfs  processor work?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-How-does-ListHdfs-processor-work/m-p/195529#M68140</link>
      <description>&lt;P&gt;I want  to  read  some  file (which  are put in  hdfs  directory) and  i want to use ListHdfs  processor  for  it , there  are several  questions  i am  interested  in:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;when i start  listHdfs procesoor   it  will  capture  all  files  from directory  and  if  i  change it's  state  then(  i mean i  stop the processor)  and  then start it  it again it willl  take only  those  files  whcih  were  put in dircetory  recentrly or  all  files   which are  in directory?&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Sat, 16 Sep 2017 01:47:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-How-does-ListHdfs-processor-work/m-p/195529#M68140</guid>
      <dc:creator>salome_tkhilais</dc:creator>
      <dc:date>2017-09-16T01:47:58Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi:How  does ListHdfs  processor work?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-How-does-ListHdfs-processor-work/m-p/195530#M68141</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/30155/salometkhilaishvili.html" nodeid="30155"&gt;@sally sally&lt;/A&gt;&lt;P&gt;The processor will only list the files which were not included in the first listing it created.&lt;/P&gt;</description>
      <pubDate>Sat, 16 Sep 2017 03:19:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-How-does-ListHdfs-processor-work/m-p/195530#M68141</guid>
      <dc:creator>Wynner</dc:creator>
      <dc:date>2017-09-16T03:19:57Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi:How  does ListHdfs  processor work?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-How-does-ListHdfs-processor-work/m-p/195531#M68142</link>
      <description>&lt;P&gt;In order to have listing start over again, you would need to perform the following:&lt;/P&gt;&lt;P&gt;1. Open "Component State" UI  by right clicking on the listHDFS processor and select "view state".&lt;/P&gt;&lt;P&gt;2. Within that UI you will see a blue link "Clear state" which will clear the currentlr retained state.&lt;/P&gt;</description>
      <pubDate>Sat, 16 Sep 2017 03:39:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-How-does-ListHdfs-processor-work/m-p/195531#M68142</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2017-09-16T03:39:45Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi:How  does ListHdfs  processor work?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-How-does-ListHdfs-processor-work/m-p/195532#M68143</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/30155/salometkhilaishvili.html" nodeid="30155" target="_blank"&gt;@sally sally&lt;/A&gt;, List Hdfs processor are developed as store the last state..&lt;BR /&gt;i.e when you configure ListHDFS processor you are going to specify directory name in properties. once the processor lists all the files existed in that directory at the time it will stores the state as maximum file time when it got stored into HDFS. you can view the state info by clicking on view state button.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="40382-state-listhdfs.png" style="width: 488px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18002i06572DE05F9A6260/image-size/medium?v=v2&amp;amp;px=400" role="button" title="40382-state-listhdfs.png" alt="40382-state-listhdfs.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;if you want to clear the state then you need to get into view state and click on clear the state.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="40383-clear-state.png" style="width: 737px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18003i9353308600C8ED79/image-size/medium?v=v2&amp;amp;px=400" role="button" title="40383-clear-state.png" alt="40383-clear-state.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;2. so once it saves the state in listhdfs processor, if you are running the processor by scheduling as cron(or)timer driven it will only checks for the new files after the state timestamp.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Note:-&lt;/STRONG&gt; as we are running ListHDFS on primary node only, but this state value will be stored across all the nodes of NiFi cluster as primary node got changed, there won't be any issues regarding duplicates. &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Example:-&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;hadoop fs -ls /user/yashu/test/ Found 1 items
-rw-r--r--   3 yash hdfs          3 2017-09-15 16:16 /user/yashu/test/part1.txt&lt;/PRE&gt;&lt;P&gt;when i configure ListHDFS processor to list all the files in the above directory &lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="40384-listhdfs-config.png" style="width: 728px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18004i87226255683943A5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="40384-listhdfs-config.png" alt="40384-listhdfs-config.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;if you see the state of ListHDFS processor that should be same as when part1.txt got stored in HDFS in our case that should be &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt; 2017-09-15 16:16&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="40385-state.png" style="width: 734px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18005i4DA0E9DA762A5F9A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="40385-state.png" alt="40385-state.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;it would be unix time in milliseconds when we convert the state time to date time format&lt;BR /&gt;that should be &lt;/P&gt;&lt;PRE&gt;Unixtime in milliseconds:- 1505506613479&lt;BR /&gt;Timestamp               :- 2017-09-15 16:16:53&lt;/PRE&gt;&lt;P&gt;so the processor has stored the state, when it will run again it will lists only the new files that got stored after the state timestamp in to the directory and updates the state with new state time (i.e maximum file created in hadoop directory).&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 06:37:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-How-does-ListHdfs-processor-work/m-p/195532#M68143</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2019-08-18T06:37:15Z</dc:date>
    </item>
  </channel>
</rss>

