<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Suggestions to handle high volume streaming data in NiFi in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126342#M51405</link>
    <description>&lt;P&gt;You can, but you are just shifting the problem downstream from ListenTCP to SplitText. SplitText now has to produce thousands/millions of flow files that would have been coming out of ListenTCP. It is slightly better though because it gives ListenTCP a chance to keep up with the source.&lt;/P&gt;&lt;P&gt;It would be most efficient to avoid splitting to the individual flow files if possible. Since you are merging things together before HDFS, it shouldn't matter if you are merging many flow files with one message each, or a few flow files containing thousands of messages each.&lt;/P&gt;&lt;P&gt;It just comes down to whether you want to rewrite some of the logic in your custom processor. &lt;/P&gt;</description>
    <pubDate>Fri, 13 Jan 2017 02:51:38 GMT</pubDate>
    <dc:creator>bbende</dc:creator>
    <dc:date>2017-01-13T02:51:38Z</dc:date>
    <item>
      <title>Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126335#M51398</link>
      <description>&lt;P&gt;Hi guys,&lt;/P&gt;&lt;P&gt;I have a use case where we need to load near real-time streaming data into HDFS; incoming data is of high volume, about 1500 messages per second; I've a NiFi dataflow where the ListenTCP processor is ingesting the streaming data, but the requirement is to check the incoming messages for the required structure; so, messages from ListenTCP go to a custom processor that does the structure checking; only messages that have the right structure move forward to MergeContent processor and onto PutHDFS; right now, the validation/check processor became a bottleneck and the backpressure from that processor is causing ListenTCP to queue messages at the source system (the one sending the messages);&lt;/P&gt;&lt;P&gt;Since the message validation processor is not able to handle the incoming data fast enough, I'm thinking that I write the messages from ListenTCP first to the file system and then let the validation processor get the messages from the file system and continue forward. Is this the right approach to resolve this; are there any suggestions for alternatives.&lt;/P&gt;&lt;P&gt;Thanks in advance.    &lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 11:54:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126335#M51398</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-01-12T11:54:43Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126336#M51399</link>
      <description>&lt;P&gt;Add more nodes to your NiFi cluster or you can add RAM.   Move to a bigger box (more RAM, CPU, Cores).&lt;/P&gt;&lt;P&gt;1500 messages is not a lot for NiFi.&lt;/P&gt;&lt;P&gt;You should be able to process 10k easy.&lt;/P&gt;&lt;P&gt;What are you JVM settings?&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html"&gt;https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;This one is important:  &lt;A href="https://community.hortonworks.com/articles/30424/optimizing-performance-of-apache-nifis-network-lis.html"&gt;https://community.hortonworks.com/articles/30424/optimizing-performance-of-apache-nifis-network-lis.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;See:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/9782/nifihdf-dataflow-optimization-part-1-of-2.html"&gt;https://community.hortonworks.com/articles/9782/nifihdf-dataflow-optimization-part-1-of-2.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/content/kbentry/9785/nifihdf-dataflow-optimization-part-2-of-2.html"&gt;https://community.hortonworks.com/content/kbentry/9785/nifihdf-dataflow-optimization-part-2-of-2.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://dzone.com/articles/apache-nifi-10-cheatsheet"&gt;https://dzone.com/articles/apache-nifi-10-cheatsheet&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/68375/nifi-cluster-and-load-balancer.html"&gt;https://community.hortonworks.com/articles/68375/nifi-cluster-and-load-balancer.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Check to see if something is failing or where it is slow&lt;/P&gt;&lt;P&gt;&lt;A href="https://dzone.com/articles/finding-nifi-errors"&gt;https://dzone.com/articles/finding-nifi-errors&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 13:15:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126336#M51399</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2017-01-12T13:15:00Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126337#M51400</link>
      <description>&lt;P&gt;See above comments.&lt;/P&gt;&lt;P&gt;The main issue it to up your JVM memory.   If you add 12-16 GB you should be awesome.&lt;/P&gt;&lt;P&gt;If it's a VM environment, give the node 16-32 or more cores.   If that's not enough, go to multiple nodes in the cluster.&lt;/P&gt;&lt;P&gt;One node should scale to 10k/sec easy.   How big are these files?   Anything failing?  Errors in the logs?&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 13:19:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126337#M51400</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2017-01-12T13:19:28Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126338#M51401</link>
      <description>&lt;P&gt;Thanks a lot &lt;A rel="user" href="https://community.cloudera.com/users/9304/tspann.html" nodeid="9304" target="_blank"&gt;@Timothy Spann&lt;/A&gt; I'm going to work with our Admin about the JVM settings and about the # of cores we have.&lt;/P&gt;&lt;P&gt;The flowfiles are small, about 5 KB each or less. ListenTCP processor is throwing these errors - "Internal queue at maximum capacity, could not queue event"; and messages are queuing on the source system side. Below are the memory settings for the ListenTCP that I set.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="11377-listentcp-properties.png" style="width: 399px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/22654iA2D4501424924DE9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="11377-listentcp-properties.png" alt="11377-listentcp-properties.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2019 10:06:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126338#M51401</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2019-08-19T10:06:18Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126339#M51402</link>
      <description>&lt;P&gt;Also, I'm in the process of having the Socket buffer (for ListenTCP) increased to 4 MB (the max the Unix admins can change it to).&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 22:40:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126339#M51402</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-01-12T22:40:47Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126340#M51403</link>
      <description>&lt;P&gt;The single biggest performance improvement for ListenTCP will be increasing the "Max Batch Size" from 1 to something like 1000, or maybe even more. The reason is because it will drastically reduce the number of flow files produced by ListenTCP, which will drastically reduce the amount of I/O to the internal NiFi repositories.&lt;/P&gt;&lt;P&gt;The downside is you won't have a single message per flow file anymore so your validation needs to work differently. If you can change your custom processor to stream in a flow file, read each line, and only write out the validated lines to the output stream then it should work well. If the validation processor is still the bottleneck, you could increase the concurrent tasks of this processor slightly so that it keeps up with the batches coming out ListenTCP.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jan 2017 23:01:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126340#M51403</guid>
      <dc:creator>bbende</dc:creator>
      <dc:date>2017-01-12T23:01:19Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126341#M51404</link>
      <description>&lt;P&gt;Thanks for the suggestion &lt;A rel="user" href="https://community.cloudera.com/users/363/bbende.html" nodeid="363"&gt;@Bryan Bende&lt;/A&gt;; I'll try that approach, in addition to the JVM and Cores that Timothy mentioned.&lt;/P&gt;&lt;P&gt;I was thinking, if we batch files at ListenTCP, can we not Split them back to individual files with SplitText (the messages are text files) before passing them to our custom processor, minimizing the rework in the custom processor. Do you see any performance hit with this idea?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jan 2017 00:56:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126341#M51404</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-01-13T00:56:41Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126342#M51405</link>
      <description>&lt;P&gt;You can, but you are just shifting the problem downstream from ListenTCP to SplitText. SplitText now has to produce thousands/millions of flow files that would have been coming out of ListenTCP. It is slightly better though because it gives ListenTCP a chance to keep up with the source.&lt;/P&gt;&lt;P&gt;It would be most efficient to avoid splitting to the individual flow files if possible. Since you are merging things together before HDFS, it shouldn't matter if you are merging many flow files with one message each, or a few flow files containing thousands of messages each.&lt;/P&gt;&lt;P&gt;It just comes down to whether you want to rewrite some of the logic in your custom processor. &lt;/P&gt;</description>
      <pubDate>Fri, 13 Jan 2017 02:51:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126342#M51405</guid>
      <dc:creator>bbende</dc:creator>
      <dc:date>2017-01-13T02:51:38Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126343#M51406</link>
      <description>&lt;P&gt;thanks for clarifying &lt;A rel="user" href="https://community.cloudera.com/users/363/bbende.html" nodeid="363"&gt;@Bryan Bende&lt;/A&gt;, it makes sense.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jan 2017 04:30:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126343#M51406</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-01-13T04:30:52Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126344#M51407</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/9304/tspann.html" nodeid="9304"&gt;@Timothy Spann&lt;/A&gt; you were right on the money, increasing the JVM memory did the trick for me. Thanks.</description>
      <pubDate>Sun, 15 Jan 2017 10:49:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126344#M51407</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-01-15T10:49:50Z</dc:date>
    </item>
    <item>
      <title>Re: Suggestions to handle high volume streaming data in NiFi</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126345#M51408</link>
      <description>&lt;P&gt;Bigger files are better than millions of little one.&lt;/P&gt;</description>
      <pubDate>Sun, 15 Jan 2017 11:23:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Suggestions-to-handle-high-volume-streaming-data-in-NiFi/m-p/126345#M51408</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2017-01-15T11:23:22Z</dc:date>
    </item>
  </channel>
</rss>

