<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Nifi processor that deletes the older day files in HDFS. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-processor-that-deletes-the-older-day-files-in-HDFS/m-p/221178#M82214</link>
    <description>&lt;P&gt;I am planning to put a processor that executes a query on hive and stores the results to HDFS in CSV with Timestamp as name of the file. And from there I want to run the same job for every 24 hours. In parallel to that I want to put a processor that deletes previous days records in HDFS everyday.&lt;/P&gt;&lt;P&gt;-- For this I need some processor which names the timestamps to the output file and a processor that deletes the file from HDFS.&lt;/P&gt;&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/641/mburgess.html" nodeid="641"&gt;@Matt Burgess&lt;/A&gt;  &lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 15 Aug 2018 23:55:13 GMT</pubDate>
    <dc:creator>saikrishnamakin</dc:creator>
    <dc:date>2018-08-15T23:55:13Z</dc:date>
    <item>
      <title>Nifi processor that deletes the older day files in HDFS.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-processor-that-deletes-the-older-day-files-in-HDFS/m-p/221178#M82214</link>
      <description>&lt;P&gt;I am planning to put a processor that executes a query on hive and stores the results to HDFS in CSV with Timestamp as name of the file. And from there I want to run the same job for every 24 hours. In parallel to that I want to put a processor that deletes previous days records in HDFS everyday.&lt;/P&gt;&lt;P&gt;-- For this I need some processor which names the timestamps to the output file and a processor that deletes the file from HDFS.&lt;/P&gt;&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/641/mburgess.html" nodeid="641"&gt;@Matt Burgess&lt;/A&gt;  &lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Aug 2018 23:55:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-processor-that-deletes-the-older-day-files-in-HDFS/m-p/221178#M82214</guid>
      <dc:creator>saikrishnamakin</dc:creator>
      <dc:date>2018-08-15T23:55:13Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi processor that deletes the older day files in HDFS.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-processor-that-deletes-the-older-day-files-in-HDFS/m-p/221179#M82215</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/56174/saikrishnamakineni-1.html" nodeid="56174"&gt;@Sai Krishna Makineni&lt;/A&gt;&lt;P&gt;You can use either&lt;STRONG&gt; ListHDFS (or) GetHDFSFileInfo&lt;/STRONG&gt; processors and then processor will not store the state and you can schedule this processor to run at nightly and once you list the files from HDFS then you can use &lt;STRONG&gt;hdfs.lastModified &lt;/STRONG&gt;attribute&lt;B&gt;(or) you can use your filename with substringAfter &lt;/B&gt;function  and check the timestamp value in your RouteOnAttribute processor.&lt;/P&gt;&lt;P&gt;Once you filterout the files that are more than specific time then feed to DeleteHDFS processor to delete them.&lt;/P&gt;&lt;P&gt;In addition &lt;STRONG&gt;ListHDFS processor stores the state&lt;/STRONG&gt; and runs only incrementally so if you want to clear the state then use RestAPI with&lt;/P&gt;&lt;PRE&gt;/processors/{id}/state/clear-requests&lt;/PRE&gt;&lt;P&gt;To clear the state and run the processor once you clear the state.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Flow:&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;1.ListHDFS2.RouteOnAttribute //check the filename (or) lastmodified time3.DeleteHDFS //delete the files in hdfs&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;Flow:&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;1.GenerateFlowFile&lt;BR /&gt;2.GetHDFSFileINFO&lt;BR /&gt;3.RouteOnAttribute&lt;BR /&gt;4.DeleteHDFS&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;(or)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;You can use GetHDFS processor&lt;STRONG&gt;(Keep source file to true)&lt;/STRONG&gt; which doesn't store the state but in this processor we are fetching the files from HDFS if the file is big then we are keeping lot of load on NiFi.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Aug 2018 04:45:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-processor-that-deletes-the-older-day-files-in-HDFS/m-p/221179#M82215</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2018-08-16T04:45:18Z</dc:date>
    </item>
  </channel>
</rss>

