Created on 01-30-2017 01:32 PM - edited 09-16-2022 03:59 AM
Hello -
As we are recomputing data everyday, I need remove old data and load new data everyday. We create our parquet data files through Map Reduce. So in order to reach ZERO downtime during switching yesterday's data with today's data, I came up with the idea of having a fixed VIEW and then after batch processing issue a ALTER VIEW statement to change the underlying table.
first time - CREATE VIEW table_view AS SELECT * from table_0130
daily - ALTER VIEW table_view AS SELECT * from table_0131
Most of our queries worked well. The response time did degrade slightly but nothing alarming. But for few BIG JOIN queries, the response time went from 2-3 secs to 2-3 mins.
On further digging into query profile, I found that the query planning is taking 2+ mins. Why would it take so much time? The VIEW is a simple one, just a SELECT *. Any impala conf settings that can resolve this?
I appreciate any help, pointers regarding this issue.
Querying VIEW
Planner Timeline: 2m17s - Analysis finished: 2s588ms (2s588ms) - Equivalence classes computed: 1m16s (1m13s) - Single node plan created: 2m17s (1m1s) - Distributed plan created: 2m17s (223.64ms) - Lineage info computed: 2m17s (2.6ms) - Planning finished: 2m17s (9.974ms) Query Timeline: 2m31s - Start execution: 53.597us (53.597us) - Planning finished: 2m26s (2m26s) - Ready to start remote fragments: 2m26s (63.364ms) - Remote fragments started: 2m31s (4s442ms) - Cancelled: 2m31s (5.567ms) - Rows available: 2m31s (35.971ms) - Unregister query: 2m31s (118.833us)
Querying TABLE (directly)
Planner Timeline: 55.334ms - Analysis finished: 21.430ms (21.430ms) - Equivalence classes computed: 22.938ms (1.507ms) - Single node plan created: 47.813ms (24.875ms) - Distributed plan created: 51.913ms (4.99ms) - Lineage info computed: 52.394ms (481.757us) - Planning finished: 55.334ms (2.939ms) Query Timeline: 1s036ms - Start execution: 45.736us (45.736us) - Planning finished: 125.378ms (125.332ms) - Ready to start remote fragments: 129.281ms (3.902ms) - Remote fragments started: 478.56ms (348.775ms) - Rows available: 882.741ms (404.685ms) - First row fetched: 982.468ms (99.727ms) - Unregister query: 998.825ms (16.356ms)
Created 02-02-2017 06:00 AM
@gaurang - I suspect you may be hitting IMPALA-4242. Can you reduce the number of columns you're querying?
Created 02-01-2017 12:44 PM
I have a question for you.
How long the metadata loaded from Hive metastore by Impala Catalog Daemon stay in memory?
I'm using Impala 2.7 ( KUDU ).
It seems the metadata is flushed more often than before.
Is there any configuration for life cycle for metadata in catalog daemon has?
I'm asking this question here because I guess @Lars Volker answer can help resolve your issue.
Thank you
Gatsby
Created 02-02-2017 06:02 AM
@thewayofthinkin - I don't know for sure, but I don't think metadata is flushed periodically. There also don't seem to be any configuration options of catalogd around metadata caching. Instead, the catalog should flush metadata when requested by "invalidate metadata" or by "refresh" or when a DDL statement makes changes to a table's metadata. Such changes should show up in the logfiles however.
Created 02-02-2017 09:26 AM
Created 02-01-2017 02:09 PM
Today, I had some issue with slow quries.
And, the issue was related to metadata Catalog Daemon caches.
How often do you make quries to that TABLE/VIEW ( I don't think your issue is related to VIEW )?
In my case, metadata for TABLE was reloaded very often because Catalog Daemon flushes out metadata.
Take a look your catalog daemon and check if TABLE metadata is cached.
Gatsby
Created 02-02-2017 10:07 PM
@gaurang would you be open to sharing your CREATE TABLEs, CREATE VIEW and the query that has slow planning time? No need for the data, just that should be sufficient for us to understand better what's going on.
Like Lars said, you are probably hitting IMPALA-4242 which explains the slow equivalence class computation, but I'd also like to understand the slow single-node planning time.
Thanks!