Support Questions

gaurang · ‎01-30-2017

Hello -

As we are recomputing data everyday, I need remove old data and load new data everyday. We create our parquet data files through Map Reduce. So in order to reach ZERO downtime during switching yesterday's data with today's data, I came up with the idea of having a fixed VIEW and then after batch processing issue a ALTER VIEW statement to change the underlying table.

first time - CREATE VIEW table_view AS SELECT * from table_0130

daily - ALTER VIEW table_view AS SELECT * from table_0131

Most of our queries worked well. The response time did degrade slightly but nothing alarming. But for few BIG JOIN queries, the response time went from 2-3 secs to 2-3 mins.

On further digging into query profile, I found that the query planning is taking 2+ mins. Why would it take so much time? The VIEW is a simple one, just a SELECT *. Any impala conf settings that can resolve this?

I appreciate any help, pointers regarding this issue.

Querying VIEW

    Planner Timeline: 2m17s
       - Analysis finished: 2s588ms (2s588ms)
       - Equivalence classes computed: 1m16s (1m13s)
       - Single node plan created: 2m17s (1m1s)
       - Distributed plan created: 2m17s (223.64ms)
       - Lineage info computed: 2m17s (2.6ms)
       - Planning finished: 2m17s (9.974ms)
    Query Timeline: 2m31s
       - Start execution: 53.597us (53.597us)
       - Planning finished: 2m26s (2m26s)
       - Ready to start remote fragments: 2m26s (63.364ms)
       - Remote fragments started: 2m31s (4s442ms)
       - Cancelled: 2m31s (5.567ms)
       - Rows available: 2m31s (35.971ms)
       - Unregister query: 2m31s (118.833us)

Querying TABLE (directly)

    Planner Timeline: 55.334ms
       - Analysis finished: 21.430ms (21.430ms)
       - Equivalence classes computed: 22.938ms (1.507ms)
       - Single node plan created: 47.813ms (24.875ms)
       - Distributed plan created: 51.913ms (4.99ms)
       - Lineage info computed: 52.394ms (481.757us)
       - Planning finished: 55.334ms (2.939ms)
    Query Timeline: 1s036ms
       - Start execution: 45.736us (45.736us)
       - Planning finished: 125.378ms (125.332ms)
       - Ready to start remote fragments: 129.281ms (3.902ms)
       - Remote fragments started: 478.56ms (348.775ms)
       - Rows available: 882.741ms (404.685ms)
       - First row fetched: 982.468ms (99.727ms)
       - Unregister query: 998.825ms (16.356ms)

Lars Volker · ‎02-02-2017

@gaurang - I suspect you may be hitting IMPALA-4242. Can you reduce the number of columns you're querying?

View solution in original post

thewayofthinkin · ‎02-01-2017

@Lars Volker

I have a question for you.

How long the metadata loaded from Hive metastore by Impala Catalog Daemon stay in memory?

I'm using Impala 2.7 ( KUDU ).

It seems the metadata is flushed more often than before.

Is there any configuration for life cycle for metadata in catalog daemon has?

@gaurang

I'm asking this question here because I guess @Lars Volker answer can help resolve your issue.

Thank you

Gatsby

Lars Volker · ‎02-02-2017

@thewayofthinkin - I don't know for sure, but I don't think metadata is flushed periodically. There also don't seem to be any configuration options of catalogd around metadata caching. Instead, the catalog should flush metadata when requested by "invalidate metadata" or by "refresh" or when a DDL statement makes changes to a table's metadata. Such changes should show up in the logfiles however.

thewayofthinkin · ‎02-02-2017

yeap. you're right. I will take a look log.

Thank you
Gatsby

thewayofthinkin · ‎02-01-2017

@gaurang

Today, I had some issue with slow quries.

And, the issue was related to metadata Catalog Daemon caches.

How often do you make quries to that TABLE/VIEW ( I don't think your issue is related to VIEW )?

In my case, metadata for TABLE was reloaded very often because Catalog Daemon flushes out metadata.

Take a look your catalog daemon and check if TABLE metadata is cached.

Gatsby

alex.behm · ‎02-02-2017

@gaurang would you be open to sharing your CREATE TABLEs, CREATE VIEW and the query that has slow planning time? No need for the data, just that should be sufficient for us to understand better what's going on.

Like Lars said, you are probably hitting IMPALA-4242 which explains the slow equivalence class computation, but I'd also like to understand the slow single-node planning time.

Thanks!

Cloudera Community

Support Questions

When querying a VIEW, query planning takes a long time