Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala Query Optimization: Does query skip unrelated columns from select

Impala Query Optimization: Does query skip unrelated columns from select

New Contributor

 

I know impala re-formulates the query to run in an optimizated way. Lets say I have a query

 

select location, zipcode, car_model from (

select location, zipcode, car_model, car_color from table1 

union

select location, zipcode, car_model, car_color from table2) 

 

You can see car_color is being queried on inner tables but not required on outer select.

Does Query optimization handle not querying extra columns ?. Is there any command to check this.

 

Thanks.

 

5 REPLIES 5

Re: Impala Query Optimization: Does query skip unrelated columns from select

Guru
That's a good question. However, different version of Impala might behave differently, what version of CDH are you using? I can see if I can check for you.

Cheers
Eric

Re: Impala Query Optimization: Does query skip unrelated columns from select

New Contributor

Thanks for the quick reply Eric.

We are using v2.9.0-cdh5.12.2.

Re: Impala Query Optimization: Does query skip unrelated columns from select

New Contributor

Did you get a chance to look at it @EricL  ?

Highlighted

Re: Impala Query Optimization: Does query skip unrelated columns from select

Master Collaborator

Yes the Impala planner will drop unused columns at various points in the plan, often in aggregations or sorts. It's not easy to precisely enumerate when it will or won't happen, but it's definitely a thing that the planner will do in many cases.

 

You can look at row-size= in the extended explain plans (explain_level>= 2) to get an idea of the size of the row at each point in the plan.

Re: Impala Query Optimization: Does query skip unrelated columns from select

Master Collaborator

I think in your specific query it should notice that the column car_color isn't being used. I didn't test. You could confirm by adding/removing unused columns and seeing if row-size changes in the explain plans.

Don't have an account?
Coming from Hortonworks? Activate your account here