<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: ExecuteSQL query is very slow in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/ExecuteSQL-query-is-very-slow/m-p/152915#M115380</link>
    <description>&lt;P&gt;The approach in the other thread is very inefficient for this use case. You're basically trying to do a join between rows in a file and rows in a DB table. An alternative is to populate a DistributedMapCacheServer from the DB table, then look up those values in a separate flow.&lt;/P&gt;&lt;P&gt;To populate the map, you could do something like this:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="10276-populatelookuptable.png" style="width: 386px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21072i5969EDD7BDA291E2/image-size/medium?v=v2&amp;amp;px=400" role="button" title="10276-populatelookuptable.png" alt="10276-populatelookuptable.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Here I am using QueryDatabaseTable with a Max Value Column of "id" such that the map will only be populated once. But if you are adding entries to the lookup table (as it appears you might be from your description) or if new entries will not have strictly greater values for "id", then you can remove the Max Value Column property and schedule the QueryDatabaseTable processor to run as often as you'd like to refresh the values.&lt;/P&gt;&lt;P&gt;Once this flow is running, you can start a different flow that is similar to the one in the other thread, but instead of querying the DB for each row in the file, it will fetch from the DistributedCacheMapServer, which is hopefully faster:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="10277-performlookup.png" style="width: 976px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21073i3612CDEBAC42757F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="10277-performlookup.png" alt="10277-performlookup.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;You can see the first part is the same as the flow in the other thread, but instead of using ReplaceText to generate SQL to execute, the value is simply looked up from the Map and put into an attribute, then the final ReplaceText is like the one in the other thread, specifying "${column.1},${column.2},${column.3},${column.4}, ${customer.name}" or whatever the appropriate attributes are. I have attached a template (&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/10278-databaselookupexample.xml" target="_blank"&gt;databaselookupexample.xml&lt;/A&gt;) showing these two flows.&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 12:42:23 GMT</pubDate>
    <dc:creator>mburgess</dc:creator>
    <dc:date>2019-08-18T12:42:23Z</dc:date>
    <item>
      <title>ExecuteSQL query is very slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ExecuteSQL-query-is-very-slow/m-p/152914#M115379</link>
      <description>&lt;P&gt;Hello guys,&lt;/P&gt;
&lt;P&gt;I have the below NiFi flow, based on the thread&lt;/P&gt;
&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/70562/executesql-dynamic-query.html#comment-71349" target="_blank" rel="nofollow noopener noreferrer"&gt;https://community.hortonworks.com/questions/70562/executesql-dynamic-query.html#comment-71349&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;The ExecuteSQL processor is executing very simple select query which returns the results very fast if executed from command line..&lt;/P&gt;
&lt;P&gt;I dunno why I keep getting huge queue before the executeSQL, I tried to increase the running threads but still the output is slow...&lt;/P&gt;
&lt;P&gt;any suggestion how to improve this ? bearing in mind that I have another process group which execute insert query and running smoothly, fast ?&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="10300-view.png" style="width: 1101px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21074i84657B0E32E48885/image-size/medium?v=v2&amp;amp;px=400" role="button" title="10300-view.png" alt="10300-view.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Note the queue is full before ExectueSQL, I also got heap size errors when I increased the ExecuteSQL threads to 16 ?&lt;/P&gt;
&lt;P&gt;I attached the flow as well&lt;/P&gt;
&lt;P&gt;Thanks forum&lt;/P&gt;</description>
      <pubDate>Wed, 02 Oct 2019 16:16:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ExecuteSQL-query-is-very-slow/m-p/152914#M115379</guid>
      <dc:creator>yahya_najjar</dc:creator>
      <dc:date>2019-10-02T16:16:06Z</dc:date>
    </item>
    <item>
      <title>Re: ExecuteSQL query is very slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ExecuteSQL-query-is-very-slow/m-p/152915#M115380</link>
      <description>&lt;P&gt;The approach in the other thread is very inefficient for this use case. You're basically trying to do a join between rows in a file and rows in a DB table. An alternative is to populate a DistributedMapCacheServer from the DB table, then look up those values in a separate flow.&lt;/P&gt;&lt;P&gt;To populate the map, you could do something like this:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="10276-populatelookuptable.png" style="width: 386px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21072i5969EDD7BDA291E2/image-size/medium?v=v2&amp;amp;px=400" role="button" title="10276-populatelookuptable.png" alt="10276-populatelookuptable.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Here I am using QueryDatabaseTable with a Max Value Column of "id" such that the map will only be populated once. But if you are adding entries to the lookup table (as it appears you might be from your description) or if new entries will not have strictly greater values for "id", then you can remove the Max Value Column property and schedule the QueryDatabaseTable processor to run as often as you'd like to refresh the values.&lt;/P&gt;&lt;P&gt;Once this flow is running, you can start a different flow that is similar to the one in the other thread, but instead of querying the DB for each row in the file, it will fetch from the DistributedCacheMapServer, which is hopefully faster:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="10277-performlookup.png" style="width: 976px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21073i3612CDEBAC42757F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="10277-performlookup.png" alt="10277-performlookup.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;You can see the first part is the same as the flow in the other thread, but instead of using ReplaceText to generate SQL to execute, the value is simply looked up from the Map and put into an attribute, then the final ReplaceText is like the one in the other thread, specifying "${column.1},${column.2},${column.3},${column.4}, ${customer.name}" or whatever the appropriate attributes are. I have attached a template (&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/10278-databaselookupexample.xml" target="_blank"&gt;databaselookupexample.xml&lt;/A&gt;) showing these two flows.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 12:42:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ExecuteSQL-query-is-very-slow/m-p/152915#M115380</guid>
      <dc:creator>mburgess</dc:creator>
      <dc:date>2019-08-18T12:42:23Z</dc:date>
    </item>
    <item>
      <title>Re: ExecuteSQL query is very slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ExecuteSQL-query-is-very-slow/m-p/152916#M115381</link>
      <description>&lt;P&gt;FetchDistrubutedMapCash is still not fast, though the table which was fetched is only 500 rows..&lt;/P&gt;&lt;P&gt;any suggestion.. ?&lt;/P&gt;&lt;P&gt;Also..&lt;/P&gt;&lt;P&gt;how can I reset the DistributedMapCacheServer content ?&lt;/P&gt;&lt;P&gt;I'm using  the FetchDistributedMapCache to compare ${column.3} with DistributedMapCacheServer content, is it possible to use the same processor to compare ${column.4},  ? &lt;/P&gt;&lt;P&gt;Thank you &lt;A rel="user" href="https://community.cloudera.com/users/641/mburgess.html" nodeid="641"&gt;@Matt Burgess&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Dec 2016 19:16:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ExecuteSQL-query-is-very-slow/m-p/152916#M115381</guid>
      <dc:creator>yahya_najjar</dc:creator>
      <dc:date>2016-12-15T19:16:17Z</dc:date>
    </item>
    <item>
      <title>Re: ExecuteSQL query is very slow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ExecuteSQL-query-is-very-slow/m-p/278664#M208156</link>
      <description>The prefetch (fetch value) option of jdbc can improve your perfomance a lot. You can add this options as dynamic option in the connection pool. My performance improves with a factor 20 when I raised it to 2000. JDBC uses a default value 10 when you don't specify a value.</description>
      <pubDate>Wed, 02 Oct 2019 09:11:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ExecuteSQL-query-is-very-slow/m-p/278664#M208156</guid>
      <dc:creator>rudolf_schimmel</dc:creator>
      <dc:date>2019-10-02T09:11:43Z</dc:date>
    </item>
  </channel>
</rss>

