- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
When querying a VIEW, query planning takes a long time
- Labels:
-
Apache Impala
Created on 01-30-2017 01:32 PM - edited 09-16-2022 03:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello -
As we are recomputing data everyday, I need remove old data and load new data everyday. We create our parquet data files through Map Reduce. So in order to reach ZERO downtime during switching yesterday's data with today's data, I came up with the idea of having a fixed VIEW and then after batch processing issue a ALTER VIEW statement to change the underlying table.
first time - CREATE VIEW table_view AS SELECT * from table_0130
daily - ALTER VIEW table_view AS SELECT * from table_0131
Most of our queries worked well. The response time did degrade slightly but nothing alarming. But for few BIG JOIN queries, the response time went from 2-3 secs to 2-3 mins.
On further digging into query profile, I found that the query planning is taking 2+ mins. Why would it take so much time? The VIEW is a simple one, just a SELECT *. Any impala conf settings that can resolve this?
I appreciate any help, pointers regarding this issue.
Querying VIEW
Planner Timeline: 2m17s - Analysis finished: 2s588ms (2s588ms) - Equivalence classes computed: 1m16s (1m13s) - Single node plan created: 2m17s (1m1s) - Distributed plan created: 2m17s (223.64ms) - Lineage info computed: 2m17s (2.6ms) - Planning finished: 2m17s (9.974ms) Query Timeline: 2m31s - Start execution: 53.597us (53.597us) - Planning finished: 2m26s (2m26s) - Ready to start remote fragments: 2m26s (63.364ms) - Remote fragments started: 2m31s (4s442ms) - Cancelled: 2m31s (5.567ms) - Rows available: 2m31s (35.971ms) - Unregister query: 2m31s (118.833us)
Querying TABLE (directly)
Planner Timeline: 55.334ms - Analysis finished: 21.430ms (21.430ms) - Equivalence classes computed: 22.938ms (1.507ms) - Single node plan created: 47.813ms (24.875ms) - Distributed plan created: 51.913ms (4.99ms) - Lineage info computed: 52.394ms (481.757us) - Planning finished: 55.334ms (2.939ms) Query Timeline: 1s036ms - Start execution: 45.736us (45.736us) - Planning finished: 125.378ms (125.332ms) - Ready to start remote fragments: 129.281ms (3.902ms) - Remote fragments started: 478.56ms (348.775ms) - Rows available: 882.741ms (404.685ms) - First row fetched: 982.468ms (99.727ms) - Unregister query: 998.825ms (16.356ms)
Created 02-02-2017 06:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@gaurang - I suspect you may be hitting IMPALA-4242. Can you reduce the number of columns you're querying?
Created 01-31-2017 12:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi gaurang,
Does the long query time only occur the first time you query a view after altering it? In that case if could be related to metadata loading for the underlying table. How many partitions and files do these tables typically have? You could also check the log files for anything suspicious.
Cheers, Lars
Created 01-31-2017 03:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a question for you.
Does VIEW still need to load metadata separately even if metadata of TABLE for VIEW is already loaded?
which CDH/Impala are you using?
Thank you
Gatsby
Created 01-31-2017 04:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@thewayofthinkin - No, the view should get expanded and then the resulting query will access the already loaded metadata of the table. From the order of the queries it looked like the view may be queried first, then the table, which could explain why the former took much longer than the latter.
Created on 01-31-2017 04:11 PM - edited 01-31-2017 04:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 01-31-2017 05:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 02-02-2017 06:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@gaurang - I suspect you may be hitting IMPALA-4242. Can you reduce the number of columns you're querying?
Created 02-02-2017 11:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my case though, I cannot select a set of columns, because the VIEW needs to be queries across the application. Maybe when we get a chance to change the front-end queries, we will try to implement this. Create different views for different reports, selecting just required columns from the underlying table.
Created 01-31-2017 05:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lars-
No. Everytime I fire the query, it takes this long. I haven't checked logs yet. Went through the query profile, and found that query planning is what killing me. Will check logs to see if I can find anything.
Thanks for your response.
Regards,
Gaurang
Created 01-31-2017 05:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi guarang,
Can you post the version of CDH and Impala you're using?
Thanks, Lars