question Re: Impala: Control fragment number in Archives of Support Questions (Read Only)

Impala: Control fragment number

JulienMaria — Fri, 16 Sep 2022 12:34:33 GMT

Hi,

For some performances measurement, I would like to control the number of fragment of a query. Is it possible?

Also, is it possible to have more than 1 fragment by impala daemon?

Thanks,

Julien

Re: Impala: Control fragment number

Tim Armstrong — Tue, 28 Nov 2017 20:22:15 GMT

Hi Julien,

I wanted to clarify the question a bit to understand what you're trying to achieve. Impala really has two different concepts.

"Fragments" are a way of breaking down the query plan into units that can be executed in a distributed manner. You can see these in query plans with explain_level >= 2. They show up as sections of the plan with a heading like "F00: PLAN FRAGMENT". There are only two modes here. The default is to produce a distributed plan, which is broken up into fragments. The alternative, when the option num_nodes is set to 1, is to produce a single-node plan with only a single fragment.

The other concept is "fragment instances", which is the number of instances of each plan fragment that are run by the query. By default you generally get 0 or 1 fragments per impala daemon, depending on whether there is any data to scan, but we will do the scanning of data in a multi-threaded way. We have a new mode, under development, where you get multiple fragments per Impala daemon, controlled by the mt_dop query option. This only works for some queries, without inserts or joins and can sometimes consume a lot more resources. mt_dop can increase throughput of queries if the bottleneck is outside of the scan, e.g. in an aggregation.

Re: Impala: Control fragment number

JulienMaria — Wed, 29 Nov 2017 09:04:45 GMT

Hello,

Many thanks for the answer! The mt_dop is exactly what we need.

I hope this development will be available with impala 2_8.

The usecase is we are migrating from a "many small servers" cluster to a "fewer bigger servers" cluster, with a 6 time factor reduction. Even with the same hardware performances, we end up having too few fragment instances to exploit all cpu.

regards

Julien