<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: GETSFTP with NiFi cluster in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220629#M182514</link>
    <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/13600/johnmteabo.html" nodeid="13600" target="_blank"&gt;@John T&lt;/A&gt;&lt;/P&gt;&lt;P&gt;When you use GetSFTP in a cluster you are duplicating your data. Each node will ingest the same data.&lt;/P&gt;&lt;P&gt;You need to use List/Fetch pattern. A great description of this feature is available here : &lt;A href="https://pierrevillard.com/2017/02/23/listfetch-pattern-and-remote-process-group-in-apache-nifi/" target="_blank" rel="nofollow noopener noreferrer"&gt;https://pierrevillard.com/2017/02/23/listfetch-pattern-and-remote-process-group-in-apache-nifi/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Now if you used the List/Fetch pattern correctly and don't have even data distribution, you need to understand that Site-to-Site protocol does batching to have better network performance. This means that if you have 3 flow files of few KB or MB to send, NiFi decides to send them to one node rather than using 3 connection. The decision is take based on data size, number of flow files and transmission duration. Because of this, you don't get data distributed when you are doing tests. Usually you test with few small files.&lt;/P&gt;&lt;P&gt;The batching threshold is by default but you can change it for each input port. Go to RPG, Input ports then click on the edit pen for your input port and you get this settings&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="77659-screen-shot-2018-06-13-at-95116-am.png" style="width: 814px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15850i71CD19AB74BBD0C0/image-size/medium?v=v2&amp;amp;px=400" role="button" title="77659-screen-shot-2018-06-13-at-95116-am.png" alt="77659-screen-shot-2018-06-13-at-95116-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="77660-screen-shot-2018-06-13-at-95136-am.png" style="width: 812px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15851i6C19D673A6819CCF/image-size/medium?v=v2&amp;amp;px=400" role="button" title="77660-screen-shot-2018-06-13-at-95136-am.png" alt="77660-screen-shot-2018-06-13-at-95136-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I hope this helps understand the behavior.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 02:24:28 GMT</pubDate>
    <dc:creator>ahadjidj</dc:creator>
    <dc:date>2019-08-18T02:24:28Z</dc:date>
    <item>
      <title>GETSFTP with NiFi cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220626#M182511</link>
      <description>&lt;P&gt;I'm using NiFi 1.6.0, in a 3 node cluster.&lt;/P&gt;&lt;P&gt;When I use GETSFTP (Set to ALL Nodes) in a clustered nifi the cluster seems to distribute the data acquired evenly among nodes.&lt;/P&gt;&lt;P&gt;Does this mean that all 3 servers GetSFTP the data evenly?&lt;/P&gt;&lt;P&gt;I also tried using FETCH SFTP to get the listings and then did a site to site, back to my own cluster and It did NOT distribute the Fetch 0 byte files evenly among the nodes for the fetch SFTP load to be evenly distributed.&lt;/P&gt;&lt;P&gt;What would be the best practice to Load Balance SFTPGET in a nifi cluster?&lt;/P&gt;&lt;P&gt;John&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jun 2018 07:31:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220626#M182511</guid>
      <dc:creator>johnmteabo</dc:creator>
      <dc:date>2018-06-13T07:31:14Z</dc:date>
    </item>
    <item>
      <title>Re: GETSFTP with NiFi cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220627#M182512</link>
      <description>&lt;P&gt;Do you mean you tried ListSFTP to get the listings?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jun 2018 07:34:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220627#M182512</guid>
      <dc:creator>mburgess</dc:creator>
      <dc:date>2018-06-13T07:34:51Z</dc:date>
    </item>
    <item>
      <title>Re: GETSFTP with NiFi cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220628#M182513</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/13600/johnmteabo.html" nodeid="13600"&gt;@John T&lt;/A&gt; Take a look at &lt;A rel="user" href="https://community.cloudera.com/users/641/mburgess.html" nodeid="641"&gt;@Matt Burgess&lt;/A&gt; answer to a similar question here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/141921/nifi-all-files-are-processed-by-sftp-processor-on.html"&gt;https://community.hortonworks.com/questions/141921/nifi-all-files-are-processed-by-sftp-processor-on.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jun 2018 10:25:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220628#M182513</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2018-06-13T10:25:35Z</dc:date>
    </item>
    <item>
      <title>Re: GETSFTP with NiFi cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220629#M182514</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/13600/johnmteabo.html" nodeid="13600" target="_blank"&gt;@John T&lt;/A&gt;&lt;/P&gt;&lt;P&gt;When you use GetSFTP in a cluster you are duplicating your data. Each node will ingest the same data.&lt;/P&gt;&lt;P&gt;You need to use List/Fetch pattern. A great description of this feature is available here : &lt;A href="https://pierrevillard.com/2017/02/23/listfetch-pattern-and-remote-process-group-in-apache-nifi/" target="_blank" rel="nofollow noopener noreferrer"&gt;https://pierrevillard.com/2017/02/23/listfetch-pattern-and-remote-process-group-in-apache-nifi/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Now if you used the List/Fetch pattern correctly and don't have even data distribution, you need to understand that Site-to-Site protocol does batching to have better network performance. This means that if you have 3 flow files of few KB or MB to send, NiFi decides to send them to one node rather than using 3 connection. The decision is take based on data size, number of flow files and transmission duration. Because of this, you don't get data distributed when you are doing tests. Usually you test with few small files.&lt;/P&gt;&lt;P&gt;The batching threshold is by default but you can change it for each input port. Go to RPG, Input ports then click on the edit pen for your input port and you get this settings&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="77659-screen-shot-2018-06-13-at-95116-am.png" style="width: 814px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15850i71CD19AB74BBD0C0/image-size/medium?v=v2&amp;amp;px=400" role="button" title="77659-screen-shot-2018-06-13-at-95116-am.png" alt="77659-screen-shot-2018-06-13-at-95116-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="77660-screen-shot-2018-06-13-at-95136-am.png" style="width: 812px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15851i6C19D673A6819CCF/image-size/medium?v=v2&amp;amp;px=400" role="button" title="77660-screen-shot-2018-06-13-at-95136-am.png" alt="77660-screen-shot-2018-06-13-at-95136-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I hope this helps understand the behavior.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 02:24:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220629#M182514</guid>
      <dc:creator>ahadjidj</dc:creator>
      <dc:date>2019-08-18T02:24:28Z</dc:date>
    </item>
    <item>
      <title>Re: GETSFTP with NiFi cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220630#M182515</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/content/kbentry/109629/how-to-achieve-better-load-balancing-using-nifis-s.html" target="_blank"&gt;https://community.hortonworks.com/content/kbentry/109629/how-to-achieve-better-load-balancing-using-nifis-s.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jun 2018 22:18:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/GETSFTP-with-NiFi-cluster/m-p/220630#M182515</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2018-06-13T22:18:34Z</dc:date>
    </item>
  </channel>
</rss>

