<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Impala: Control fragment number in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Control-fragment-number/m-p/62311#M71893</link>
    <description>&lt;P&gt;Hi Julien,&lt;/P&gt;&lt;P&gt;&amp;nbsp; I wanted to clarify the question a bit to understand what you're trying to achieve. Impala really has two different concepts.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"Fragments" are a way of breaking down the query plan into units that can be executed in a distributed manner. You can see these in query plans with explain_level &amp;gt;= 2. They show up as sections of the plan with a heading like "F00: PLAN FRAGMENT". There are only two modes here. The default is to produce a distributed plan, which is broken up into fragments. The alternative, when the option num_nodes is set to 1, is to produce a single-node plan with only a single fragment.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The other concept is "fragment instances", which is the number of instances of each plan fragment that are run by the query. By default you generally get 0 or 1 fragments per impala daemon, depending on whether there is any data to scan, but we will do the scanning of data in a multi-threaded way. We have a new mode, under development, where you get multiple fragments per Impala daemon, controlled by the mt_dop query option. This only works for some queries, without inserts or joins and can sometimes consume a lot more resources. mt_dop can increase throughput of queries if the bottleneck is outside of the scan, e.g. in an aggregation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 28 Nov 2017 20:22:15 GMT</pubDate>
    <dc:creator>Tim Armstrong</dc:creator>
    <dc:date>2017-11-28T20:22:15Z</dc:date>
    <item>
      <title>Impala: Control fragment number</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Control-fragment-number/m-p/62287#M71892</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;For some performances measurement, I would like to control the number of fragment of a query. Is it possible?&lt;/P&gt;&lt;P&gt;Also, is it possible to have more than 1 fragment by impala daemon?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Julien&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:34:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Control-fragment-number/m-p/62287#M71892</guid>
      <dc:creator>JulienMaria</dc:creator>
      <dc:date>2022-09-16T12:34:33Z</dc:date>
    </item>
    <item>
      <title>Re: Impala: Control fragment number</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Control-fragment-number/m-p/62311#M71893</link>
      <description>&lt;P&gt;Hi Julien,&lt;/P&gt;&lt;P&gt;&amp;nbsp; I wanted to clarify the question a bit to understand what you're trying to achieve. Impala really has two different concepts.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"Fragments" are a way of breaking down the query plan into units that can be executed in a distributed manner. You can see these in query plans with explain_level &amp;gt;= 2. They show up as sections of the plan with a heading like "F00: PLAN FRAGMENT". There are only two modes here. The default is to produce a distributed plan, which is broken up into fragments. The alternative, when the option num_nodes is set to 1, is to produce a single-node plan with only a single fragment.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The other concept is "fragment instances", which is the number of instances of each plan fragment that are run by the query. By default you generally get 0 or 1 fragments per impala daemon, depending on whether there is any data to scan, but we will do the scanning of data in a multi-threaded way. We have a new mode, under development, where you get multiple fragments per Impala daemon, controlled by the mt_dop query option. This only works for some queries, without inserts or joins and can sometimes consume a lot more resources. mt_dop can increase throughput of queries if the bottleneck is outside of the scan, e.g. in an aggregation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Nov 2017 20:22:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Control-fragment-number/m-p/62311#M71893</guid>
      <dc:creator>Tim Armstrong</dc:creator>
      <dc:date>2017-11-28T20:22:15Z</dc:date>
    </item>
    <item>
      <title>Re: Impala: Control fragment number</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Control-fragment-number/m-p/62336#M71894</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Many thanks for the answer! The&amp;nbsp;&lt;SPAN&gt;mt_dop is exactly what we need.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I hope this development will be available with impala 2_8.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The usecase is we are migrating from a "many small servers" cluster to a "fewer bigger servers" cluster, with a 6 time factor reduction. Even with the same hardware&amp;nbsp;performances, we end up having too few fragment instances to exploit all cpu.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;regards&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Julien&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Nov 2017 09:04:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Control-fragment-number/m-p/62336#M71894</guid>
      <dc:creator>JulienMaria</dc:creator>
      <dc:date>2017-11-29T09:04:45Z</dc:date>
    </item>
  </channel>
</rss>

