Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala: Control fragment number

Solved Go to solution
Highlighted

Impala: Control fragment number

Explorer

Hi,

For some performances measurement, I would like to control the number of fragment of a query. Is it possible?

Also, is it possible to have more than 1 fragment by impala daemon?

 

Thanks,

Julien

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Impala: Control fragment number

Hi Julien,

  I wanted to clarify the question a bit to understand what you're trying to achieve. Impala really has two different concepts.

 

"Fragments" are a way of breaking down the query plan into units that can be executed in a distributed manner. You can see these in query plans with explain_level >= 2. They show up as sections of the plan with a heading like "F00: PLAN FRAGMENT". There are only two modes here. The default is to produce a distributed plan, which is broken up into fragments. The alternative, when the option num_nodes is set to 1, is to produce a single-node plan with only a single fragment.

 

The other concept is "fragment instances", which is the number of instances of each plan fragment that are run by the query. By default you generally get 0 or 1 fragments per impala daemon, depending on whether there is any data to scan, but we will do the scanning of data in a multi-threaded way. We have a new mode, under development, where you get multiple fragments per Impala daemon, controlled by the mt_dop query option. This only works for some queries, without inserts or joins and can sometimes consume a lot more resources. mt_dop can increase throughput of queries if the bottleneck is outside of the scan, e.g. in an aggregation.

 

View solution in original post

2 REPLIES 2
Highlighted

Re: Impala: Control fragment number

Hi Julien,

  I wanted to clarify the question a bit to understand what you're trying to achieve. Impala really has two different concepts.

 

"Fragments" are a way of breaking down the query plan into units that can be executed in a distributed manner. You can see these in query plans with explain_level >= 2. They show up as sections of the plan with a heading like "F00: PLAN FRAGMENT". There are only two modes here. The default is to produce a distributed plan, which is broken up into fragments. The alternative, when the option num_nodes is set to 1, is to produce a single-node plan with only a single fragment.

 

The other concept is "fragment instances", which is the number of instances of each plan fragment that are run by the query. By default you generally get 0 or 1 fragments per impala daemon, depending on whether there is any data to scan, but we will do the scanning of data in a multi-threaded way. We have a new mode, under development, where you get multiple fragments per Impala daemon, controlled by the mt_dop query option. This only works for some queries, without inserts or joins and can sometimes consume a lot more resources. mt_dop can increase throughput of queries if the bottleneck is outside of the scan, e.g. in an aggregation.

 

View solution in original post

Highlighted

Re: Impala: Control fragment number

Explorer

Hello,

Many thanks for the answer! The mt_dop is exactly what we need.

I hope this development will be available with impala 2_8.

 

The usecase is we are migrating from a "many small servers" cluster to a "fewer bigger servers" cluster, with a 6 time factor reduction. Even with the same hardware performances, we end up having too few fragment instances to exploit all cpu.

 

regards

Julien 

Don't have an account?
Coming from Hortonworks? Activate your account here