<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Nifi: how to use fileFileter for fetching files from hadoop? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176669#M70434</link>
    <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/144502/nifi-how-to-use-filefileter-for-fetching-files-fro.html#" rel="nofollow noopener noreferrer" target="_blank"&gt;@sally sally&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Yes you can do this in several methods using by nifi processors.&lt;/P&gt;&lt;P&gt;1.By using GetHDFS processor(pure nifi processors).&lt;/P&gt;&lt;P&gt;2.By using ListHDFS processor(pure nifi processors).&lt;/P&gt;&lt;P&gt;3.Run Script and add the attributes to the flowfile and use them in FetchHDFS processor.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Method 1:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;By using GetHDFS processor:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;U&gt;for&lt;/U&gt; testing i am having these 4 files in folder2 directory and i want to fetch only file name starting with 2011&lt;/P&gt;&lt;PRE&gt;hadoop fs -ls /user/yashu/folder2/
Found 4 items
-rw-r--r--   3 hdfs         27 2017-10-30 09:16 /user/yashu/folder2/2011-01-01.1
-rw-r--r--   3 hdfs        359 2017-10-20 08:47 /user/yashu/folder2/hbase.txt
-rw-r--r--   3 hdfs         24 2017-10-09 21:45 /user/yashu/folder2/sam.txt
-rw-r--r--   3 hdfs         12 2017-10-09 21:45 /user/yashu/folder2/sam1.txt&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;&lt;/U&gt;&lt;/STRONG&gt;Use GetHDFS processor and change property &lt;/P&gt;&lt;PRE&gt;Keep Source File to true by default is false.//if you want to keep the source in the directory then change property to true. (or) if you want to delete the file after fetching then keep property to false.&lt;/PRE&gt;&lt;P&gt;2. Give the path of your Directory&lt;/P&gt;&lt;P&gt;3.In File Filter Regex give the regex that matches your required filenames.&lt;/P&gt;&lt;P&gt;Ex:- i need only files starting with 2011 so i have given regex as &lt;/P&gt;&lt;PRE&gt;2011.*&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="41613-gethdfs.png" style="width: 722px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19665i9A25CB56A6F3CC0F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="41613-gethdfs.png" alt="41613-gethdfs.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;this processor now fetches only /user/yashu/folder2/2011-01-01.1 file from directory.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Method 2:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;using ListHDFS processor:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;U&gt;conf&lt;/U&gt;igure your directory path in list HDFS processor and this processor will list all the files that are in the directory. We cannot filter out the files that we required from listhdfs processor but every flowfile from listhdfs processor will have filename attribute associated with the flowfile.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="41617-listhdfs.png" style="width: 591px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19666iE391DC25CF795FC7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="41617-listhdfs.png" alt="41617-listhdfs.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;we can make use of filename attribute and use RouteOnAttribute processor.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;RouteOnAttribute:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Add new property in &lt;STRONG&gt;RouteOnattribute&lt;/STRONG&gt; and &lt;STRONG&gt;this processor will works as file filter to filter out the flowfiles&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Property:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;requiredfilenames&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;PRE&gt;${filename:matches('2011.*')}&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;This property only matches the filenames and routes if they satisfies the expression as above.&lt;/P&gt;&lt;P&gt;All the other filenames sam.txt,sam1.txt, ...etc are not ignored only 2011 filename will be routed to the property relation. &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="41619-routeonattribute.png" style="width: 624px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19667i00C20650BF7A2F96/image-size/medium?v=v2&amp;amp;px=400" role="button" title="41619-routeonattribute.png" alt="41619-routeonattribute.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Flow:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="41620-flow.png" style="width: 439px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19668i9523CE992FF0E34B/image-size/medium?v=v2&amp;amp;px=400" role="button" title="41620-flow.png" alt="41620-flow.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Method 3:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Run Script:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;you  can&lt;STRONG&gt; run&lt;/STRONG&gt; the &lt;STRONG&gt;script&lt;/STRONG&gt; and then use some processors(extract text..etc) to&lt;STRONG&gt; extract the filename and path name&lt;/STRONG&gt; from the &lt;STRONG&gt;result&lt;/STRONG&gt; and use those attributes in &lt;STRONG&gt;FetchHDFS&lt;/STRONG&gt; processor.&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 09:54:27 GMT</pubDate>
    <dc:creator>Shu_ashu</dc:creator>
    <dc:date>2019-08-18T09:54:27Z</dc:date>
    <item>
      <title>Nifi: how to use fileFileter for fetching files from hadoop?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176668#M70433</link>
      <description>&lt;P&gt;I want to fetch files from hadoop directory based on their filename,logically it looks like this ${filename}.* (because i have several files with similar name they look like this 2011-01-01.1 , 2011-01-01.2 etc.) i tried to use listhdfs+fetchhdfs but they can't match my logic&lt;/P&gt;&lt;OL&gt;
&lt;LI&gt;Can you give me any batter idea how can i do it inside nifi environment?&lt;/LI&gt;&lt;LI&gt;is it possible to make this task by groovy code inside ExecuteScript processor ?&lt;/LI&gt;&lt;LI&gt;how can i connect hdfs directory by groovy code ?&lt;/LI&gt;&lt;LI&gt;after getting this files i should put them in a flowfile list and can't transfer flowfiles untill flowfile list size hasn't matched the value of count attribute( placed in flowfile)&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Mon, 30 Oct 2017 15:06:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176668#M70433</guid>
      <dc:creator>salome_tkhilais</dc:creator>
      <dc:date>2017-10-30T15:06:32Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi: how to use fileFileter for fetching files from hadoop?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176669#M70434</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/144502/nifi-how-to-use-filefileter-for-fetching-files-fro.html#" rel="nofollow noopener noreferrer" target="_blank"&gt;@sally sally&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Yes you can do this in several methods using by nifi processors.&lt;/P&gt;&lt;P&gt;1.By using GetHDFS processor(pure nifi processors).&lt;/P&gt;&lt;P&gt;2.By using ListHDFS processor(pure nifi processors).&lt;/P&gt;&lt;P&gt;3.Run Script and add the attributes to the flowfile and use them in FetchHDFS processor.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Method 1:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;By using GetHDFS processor:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;U&gt;for&lt;/U&gt; testing i am having these 4 files in folder2 directory and i want to fetch only file name starting with 2011&lt;/P&gt;&lt;PRE&gt;hadoop fs -ls /user/yashu/folder2/
Found 4 items
-rw-r--r--   3 hdfs         27 2017-10-30 09:16 /user/yashu/folder2/2011-01-01.1
-rw-r--r--   3 hdfs        359 2017-10-20 08:47 /user/yashu/folder2/hbase.txt
-rw-r--r--   3 hdfs         24 2017-10-09 21:45 /user/yashu/folder2/sam.txt
-rw-r--r--   3 hdfs         12 2017-10-09 21:45 /user/yashu/folder2/sam1.txt&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;&lt;/U&gt;&lt;/STRONG&gt;Use GetHDFS processor and change property &lt;/P&gt;&lt;PRE&gt;Keep Source File to true by default is false.//if you want to keep the source in the directory then change property to true. (or) if you want to delete the file after fetching then keep property to false.&lt;/PRE&gt;&lt;P&gt;2. Give the path of your Directory&lt;/P&gt;&lt;P&gt;3.In File Filter Regex give the regex that matches your required filenames.&lt;/P&gt;&lt;P&gt;Ex:- i need only files starting with 2011 so i have given regex as &lt;/P&gt;&lt;PRE&gt;2011.*&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="41613-gethdfs.png" style="width: 722px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19665i9A25CB56A6F3CC0F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="41613-gethdfs.png" alt="41613-gethdfs.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;this processor now fetches only /user/yashu/folder2/2011-01-01.1 file from directory.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Method 2:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;using ListHDFS processor:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;U&gt;conf&lt;/U&gt;igure your directory path in list HDFS processor and this processor will list all the files that are in the directory. We cannot filter out the files that we required from listhdfs processor but every flowfile from listhdfs processor will have filename attribute associated with the flowfile.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="41617-listhdfs.png" style="width: 591px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19666iE391DC25CF795FC7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="41617-listhdfs.png" alt="41617-listhdfs.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;we can make use of filename attribute and use RouteOnAttribute processor.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;RouteOnAttribute:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Add new property in &lt;STRONG&gt;RouteOnattribute&lt;/STRONG&gt; and &lt;STRONG&gt;this processor will works as file filter to filter out the flowfiles&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Property:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;requiredfilenames&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;PRE&gt;${filename:matches('2011.*')}&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;This property only matches the filenames and routes if they satisfies the expression as above.&lt;/P&gt;&lt;P&gt;All the other filenames sam.txt,sam1.txt, ...etc are not ignored only 2011 filename will be routed to the property relation. &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="41619-routeonattribute.png" style="width: 624px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19667i00C20650BF7A2F96/image-size/medium?v=v2&amp;amp;px=400" role="button" title="41619-routeonattribute.png" alt="41619-routeonattribute.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Flow:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="41620-flow.png" style="width: 439px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19668i9523CE992FF0E34B/image-size/medium?v=v2&amp;amp;px=400" role="button" title="41620-flow.png" alt="41620-flow.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Method 3:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Run Script:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;you  can&lt;STRONG&gt; run&lt;/STRONG&gt; the &lt;STRONG&gt;script&lt;/STRONG&gt; and then use some processors(extract text..etc) to&lt;STRONG&gt; extract the filename and path name&lt;/STRONG&gt; from the &lt;STRONG&gt;result&lt;/STRONG&gt; and use those attributes in &lt;STRONG&gt;FetchHDFS&lt;/STRONG&gt; processor.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 09:54:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176669#M70434</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2019-08-18T09:54:27Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi: how to use fileFileter for fetching files from hadoop?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176670#M70435</link>
      <description>&lt;P&gt;At  first   thank  you  for  your   answer &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt; ,In this  case   how  can i find  amount  of flowfile   which   contains  "2011.*"? i need  to   find  this  value   and   check  weather it  is equal  to my  count  attribute (  main  problem  is  that  i can't  get exact  number of  flowfiles  which   match  this  regex  "2011.*'&lt;/P&gt;</description>
      <pubDate>Mon, 30 Oct 2017 23:19:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176670#M70435</guid>
      <dc:creator>salome_tkhilais</dc:creator>
      <dc:date>2017-10-30T23:19:02Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi: how to use fileFileter for fetching files from hadoop?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176671#M70436</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt; &lt;/P&gt;&lt;P&gt;how can i use this  for multiple file base on one file name &lt;/P&gt;&lt;P&gt;example :- input path contains 3 files and one is .done.cvs&lt;/P&gt;&lt;P&gt;                 emp.csv&lt;/P&gt;&lt;P&gt;                 dept.csv&lt;/P&gt;&lt;P&gt;                 account.csv&lt;/P&gt;&lt;P&gt;                 date.done.csv&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;if the input path contains the .done.csv then only my file should route in nifi flow .&lt;/P&gt;&lt;P&gt;else it should not be route .&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Mar 2019 11:43:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176671#M70436</guid>
      <dc:creator>nitindamle_123</dc:creator>
      <dc:date>2019-03-28T11:43:19Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi: how to use fileFileter for fetching files from hadoop?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176672#M70437</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt; &lt;/P&gt;&lt;P&gt;how can i use this for multiple file base on one file name&lt;/P&gt;&lt;P&gt;example :- input path contains 3 files and one is .done.cvs&lt;/P&gt;&lt;P&gt;emp.csv&lt;/P&gt;&lt;P&gt;dept.csv&lt;/P&gt;&lt;P&gt;account.csv&lt;/P&gt;&lt;P&gt;date.done.csv&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;if the input path contains the .done.csv then only my file should route in nifi flow .&lt;/P&gt;&lt;P&gt;else it should not be route .&lt;/P&gt;</description>
      <pubDate>Thu, 28 Mar 2019 11:46:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-how-to-use-fileFileter-for-fetching-files-from-hadoop/m-p/176672#M70437</guid>
      <dc:creator>nitindamle_123</dc:creator>
      <dc:date>2019-03-28T11:46:09Z</dc:date>
    </item>
  </channel>
</rss>

