Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala slow Planner Timeline with views

Highlighted

Impala slow Planner Timeline with views

Contributor

Hi, 

I am created a view on top of a table and it give me really bad performance results. It take about 2min to "Plan" the query on the views whereas it is only taking 55ms in the raw table. 
The views is really jsut a select * from table_a union select * from table_b. Where table a and b have the same definition and are both parquet tables...

 

Slow query: 
https://gist.github.com/anonymous/0af6f862a8069ee7b933deafe9cc0880


Fast query: 
https://gist.github.com/anonymous/2ef16a696527acd86795c3080edf3e91

Any idea what is happening?

thanks

3 REPLIES 3

Re: Impala slow Planner Timeline with views

Master Collaborator

Computing equivalences classes is known to be slow for complex nested queries. One issue is that your views do a "select *" and if those tables are wide (>100 columns) then that can dramatically affect the planning time.

A workaround is to select only the columns you need. I realize thia may be impractical soemtimes.

See https://issues.apache.org/jira/browse/IMPALA-4242

 

Still, a lot of time is spend during single-node planning. This is the interesting snippet from your slow profile:

Planner Timeline: 2m6s
- Analysis finished: 1s009ms (1s009ms)
- Equivalence classes computed: 1m9s (1m8s) <--- slow
- Single node plan created: 2m6s (56s716ms) <--- also slow, why?
- Runtime filters computed: 2m6s (8.364ms)
- Distributed plan created: 2m6s (46.080ms)
- Lineage info computed: 2m6s (1.754ms)
- Planning finished: 2m6s (37.579ms)

 

Would you be willing to share your table/view schemas for further investigation?

Re: Impala slow Planner Timeline with views

Contributor

Hi Alex, 
Thanks for your answer. 

I am not a huge fan of giving the exact column names, but I can tell you that both tables are paritioned by data in string format "YYYY-MM-DD" and 1798 columns of which 1795 are strings and 3 ints. 

I hope that can help. 

 

thanks

Re: Impala slow Planner Timeline with views

Master Collaborator

That's perfectly understandable. The numbers you posted are helpful. Thanks!

 

Don't have an account?
Coming from Hortonworks? Activate your account here