Member since
10-16-2013
307
Posts
77
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11284 | 04-17-2018 04:59 PM | |
6232 | 04-11-2018 10:07 PM | |
3578 | 03-02-2018 09:13 AM | |
22367 | 03-01-2018 09:22 AM | |
2672 | 02-27-2018 08:06 AM |
04-20-2017
10:06 PM
Thanks. Trying Parquet would help. Just want to see of the high optimization time in codegen is due to some glitch for Avro.
... View more
04-20-2017
09:45 PM
As an experiment, it would be interesting to try the query with the same data using a different data format, e.g., text. You can do a quick CREATE TABLE test as SELECT * FROM <original_table> and the retry the query.
... View more
04-20-2017
09:42 PM
Hi Maurin, thanks for posting, this is pretty interesting. What is the type of your "cuberon_event_date" column? Alex
... View more
04-19-2017
09:44 PM
1 Kudo
You are correct, this is a limitation: https://issues.cloudera.org/browse/IMPALA-2108 Impala would basically need to do infer the IN predicate that you are using in your workaround. You are welcome to take a stab at contributing a patch!
... View more
04-04-2017
03:48 PM
1 Kudo
You are correct, sorry for the trouble. These known issues have been resolved as part of: https://issues.apache.org/jira/browse/IMPALA-3983 https://issues.apache.org/jira/browse/IMPALA-3974
... View more
03-21-2017
12:00 AM
1 Kudo
Hi Gatsby, if your goal is to limit the rows of table A then I think the subquery is the safest bet. Your second variant with the WHERE clause also seems fine, but depending on what's in the FROM clause that predicate may not always be applied at the scan of A. So going with a subquery seems the most straightforward. Runtime filters from right-to-left on a LEFT OUTER JOIN are not possible because restrictoins on the right side cannot be directly applied to the left side. I would be surprised if MySQL had a different behavior with respect to the ON clause. Do you have an example of MySQL results being different from Impala's? Alex
... View more
03-20-2017
09:19 PM
Your second query variant should select the partition in table A. If that's not the case, something seems wrong. Could you provide a profile that shows the lack of partitioning pruning with the second query in your use case? The the first and second query variant have a different meaning, and hence different optimizations apply. The ON and WHERE clauses have very specific meanings in SQL, in particular with outer joins. The ON clause affects which rows are considered a "match" for he purpose of the outer join, so if you put "A.yearweek = 201710" in the ON clause, then those rows not satisfying that condition are considered a join non-match. The meaning of a LEFT OUTER JOIN is that the left side rows are returned even for join non-matches (with NULLs on the right side). The WHERE clause is logically applied *after* the FROM clause, so all rows produced by the FROM clause are filtered (including non-matches of the outer join, so we can move a WHERE-clause predicate on A into the scan in your second query variant).
... View more
02-23-2017
06:47 PM
Thomas, you have a legitimate request and concern. First, there is no perfectly fool-proof solution because the resource consumption is somewhat dependent on what happens at runtime, and not all memory consumption is tracked by Impala (but must is). We are constantly making improvements in this area though. 1. I'd recommend fixing the num_scanner_threads for your queries. A different number of scanner threads can result in different memory consumption from run to run (and dependent on what else is going on in the system at the time). 2. The operators of a query do not run one-by-one. Some of them run concurrently (e.g. join builds may execute concurrently). So just looking at the highest peak in the exec summary is not enough. Taking the sum of the peaks over all operators is a safer bet, but tends to overestimate the actual consumption. Hope this helps!
... View more
02-21-2017
09:47 PM
Hi Joaqin, reasonable request. Could you please file a JIRA for this new feature? It woud be great if someone from the community can pick this up! Thanks. Alex
... View more
02-16-2017
09:30 PM
I'm afraid there is no way to get that information with a query today. You could write a script that iterates over all databases/tables ('show databases' and then 'show tables in <tbl>') and then does a 'show column stats' and 'show table stats' to see if column stats are there. Be mindful that these 'show' commands will cause the table metadata to be loaded completely.
... View more