Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 8901 | 12-18-2020 01:46 PM | |
| 5898 | 12-16-2020 12:11 PM | |
| 4638 | 12-07-2020 01:47 PM | |
| 2797 | 12-07-2020 09:21 AM | |
| 1926 | 10-14-2020 11:15 AM |
11-06-2019
10:08 AM
1 Kudo
I updated the JIRA to include workarounds, just FYI.
... View more
11-06-2019
10:03 AM
This generally happens when overwriting files in-place where Impala is still trying to read a cached version of the file. E.g. insert overwrite in Hive. So you can often avoid the problem if you can avoid doing that. Otherwise doing a REFRESH of the table should resolve it.
... View more
10-23-2019
02:51 PM
I'm less familiar with hive, but I think you have to do something like: select date_format(UNIX_TIMESTAMP('2019-Oct-14 20:00:01.027898', 'yyyy-MMM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM-dd HH:mm:ss.SSSSSS');
... View more
10-23-2019
02:10 PM
[localhost:21000] default> select to_timestamp('2019-Oct-14 20:00:01.027898', 'yyyy-MMM-dd HH:mm:ss.SSSSSS');
Query: select to_timestamp('2019-Oct-14 20:00:01.027898', 'yyyy-MMM-dd HH:mm:ss.SSSSSS')
Query submitted at: 2019-10-23 14:08:19 (Coordinator: http://tarmstrong-box:25000)
Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=0d4bd87f063c53a2:c8c5759b00000000
+----------------------------------------------------------------------------+
| to_timestamp('2019-oct-14 20:00:01.027898', 'yyyy-mmm-dd hh:mm:ss.ssssss') |
+----------------------------------------------------------------------------+
| 2019-10-14 20:00:01.027898000 |
+----------------------------------------------------------------------------+
Fetched 1 row(s) in 0.11s
[localhost:21000] default> The default timestamp format accepted by Impala is ISO 8601 - https://en.wikipedia.org/wiki/ISO_8601 to_timestamp() lets you specify a format string if you want more flexibility about input timestamp formats - https://docs.cloudera.com/documentation/enterprise/latest/topics/impala_datetime_functions.html#datetime_functions__to_timestamp. You can see above how it might work.
... View more
09-20-2019
10:04 AM
@Zane- I'm late but can provide some additional insight. I think the suggestion in the error message is a good one (I'm biased because I wrote it, but some thought went into it). "Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error". The general solution for this is to set up admission control with some memory limits so that memory doesn't get oversubscribed, and so that one query can't gobble up more memory than you like. I did a talk at strata that gave pointers on a lot of this things - https://conferences.oreilly.com/strata/strata-ca-2019/public/schedule/detail/73000 In this case you can actually see that query 2f4b5cff11212907:886aa1400000000 is using Total=78.60 GB memory, so that's likely your problem. Impala's resource management is totally permissive out of the box and will happily let queries use up all the resources in the system like this. I didn't see what version you're running, but there were a lot of improvements in this area (config options, OOM-avoidance, diagnostics) in CDH6.1+ There's various other angles you can take to improve this - if the queries using lots of memory are suboptimal, tuning them (maybe just computing stats) makes a big difference. You can also
... View more
07-29-2019
05:12 PM
I filed https://issues.apache.org/jira/browse/IMPALA-8807 to fix the docs.
... View more
07-29-2019
12:18 AM
That example does show that it works in at least one case with a where referencing a partition column. I don't know off the top of my head the exact set of cases where it works, but it does seem like the docs are not totally accurate.
... View more
07-24-2019
04:28 PM
Yes! Glad you asked. There is an optimisation that can be enabled with the OPTIMIZE_PARTITION_KEY_SCANS query option: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_optimize_partition_key_scans.html. This converts queries like your example into a metadata-only query. The only reason it isn't enabled by default is because you can get different results if you have a partition with only files with 0 rows in it - the metadata doesn't have enough information to detect this case. Here it is in action: [tarmstrong-box2.ca.cloudera.com:21000] default> set OPTIMIZE_PARTITION_KEY_SCANS = 1;
OPTIMIZE_PARTITION_KEY_SCANS set to 1
[tarmstrong-box2.ca.cloudera.com:21000] default> explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0;
Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0
+--------------------------------------------------------+
| Explain String |
+--------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B Threads=1 |
| Per-Host Resource Estimates: Memory=10MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 01:AGGREGATE [FINALIZE] |
| | output: max(ss_sold_date_sk) |
| | row-size=4B cardinality=1 |
| | |
| 00:UNION |
| constant-operands=182 |
| row-size=4B cardinality=182 |
+--------------------------------------------------------+
... View more
07-15-2019
08:37 AM
1 Kudo
You shouldn't be seeing this error. I think you are probably hitting an old bug where sometimes an invalid execution plan was generated. - https://issues.apache.org/jira/browse/IMPALA-5689 or https://issues.apache.org/jira/browse/IMPALA-3063 It should be fixed in CDH5.13.0 or later.
... View more
06-18-2019
02:38 PM
1 Kudo
We fixed this behaviour in CDH6.1/Impala 3.1 - there was a product limitation that queued queries couldn't be cancelled. After the fix they should be cancellable the same as any other query. For reference, https://issues.apache.org/jira/browse/IMPALA-5216
... View more