Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala: Control fragment number

avatar
Explorer

Hi,

For some performances measurement, I would like to control the number of fragment of a query. Is it possible?

Also, is it possible to have more than 1 fragment by impala daemon?

 

Thanks,

Julien

1 ACCEPTED SOLUTION

avatar

Hi Julien,

  I wanted to clarify the question a bit to understand what you're trying to achieve. Impala really has two different concepts.

 

"Fragments" are a way of breaking down the query plan into units that can be executed in a distributed manner. You can see these in query plans with explain_level >= 2. They show up as sections of the plan with a heading like "F00: PLAN FRAGMENT". There are only two modes here. The default is to produce a distributed plan, which is broken up into fragments. The alternative, when the option num_nodes is set to 1, is to produce a single-node plan with only a single fragment.

 

The other concept is "fragment instances", which is the number of instances of each plan fragment that are run by the query. By default you generally get 0 or 1 fragments per impala daemon, depending on whether there is any data to scan, but we will do the scanning of data in a multi-threaded way. We have a new mode, under development, where you get multiple fragments per Impala daemon, controlled by the mt_dop query option. This only works for some queries, without inserts or joins and can sometimes consume a lot more resources. mt_dop can increase throughput of queries if the bottleneck is outside of the scan, e.g. in an aggregation.

 

View solution in original post

2 REPLIES 2

avatar

Hi Julien,

  I wanted to clarify the question a bit to understand what you're trying to achieve. Impala really has two different concepts.

 

"Fragments" are a way of breaking down the query plan into units that can be executed in a distributed manner. You can see these in query plans with explain_level >= 2. They show up as sections of the plan with a heading like "F00: PLAN FRAGMENT". There are only two modes here. The default is to produce a distributed plan, which is broken up into fragments. The alternative, when the option num_nodes is set to 1, is to produce a single-node plan with only a single fragment.

 

The other concept is "fragment instances", which is the number of instances of each plan fragment that are run by the query. By default you generally get 0 or 1 fragments per impala daemon, depending on whether there is any data to scan, but we will do the scanning of data in a multi-threaded way. We have a new mode, under development, where you get multiple fragments per Impala daemon, controlled by the mt_dop query option. This only works for some queries, without inserts or joins and can sometimes consume a lot more resources. mt_dop can increase throughput of queries if the bottleneck is outside of the scan, e.g. in an aggregation.

 

avatar
Explorer

Hello,

Many thanks for the answer! The mt_dop is exactly what we need.

I hope this development will be available with impala 2_8.

 

The usecase is we are migrating from a "many small servers" cluster to a "fewer bigger servers" cluster, with a 6 time factor reduction. Even with the same hardware performances, we end up having too few fragment instances to exploit all cpu.

 

regards

Julien