Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7594 | 12-18-2020 01:46 PM | |
4972 | 12-16-2020 12:11 PM | |
3788 | 12-07-2020 01:47 PM | |
2471 | 12-07-2020 09:21 AM | |
1613 | 10-14-2020 11:15 AM |
02-02-2017
10:42 AM
1 Kudo
I think this is related to https://issues.cloudera.org/browse/IMPALA-4610 . I think you already discovered the workaround of using full subqueries.
... View more
02-02-2017
10:08 AM
If you want to experiment you could try setting the query options "PREFETCH_MODE=0" and "DISABLE_STREAMING_PREAGGREGATIONS=1". You could also try adjusting NUM_SCANNER_THREADS. With the last option, we changed our recommendation recently to set it to the # of logical cores on the machine (it was 3x logical cores originally - we changed the default in a later release). You can set the default via the "default query options" impalad startup option.
... View more
02-02-2017
10:04 AM
Can you share the explain plan and the query? I'm interested to know what data types and aggregate functions you're using. We're not aware of any performance regressions between those versions - we actually made a number of improvements, e.g. changing the hash table implementation and adding software prefetching. It's possible that your workload hit one of the edge cases where these changes had a detrimental effect.
... View more
01-31-2017
05:35 PM
2 Kudos
There's a bit of a story there. When we started preparing the 5.10 CDH release, the Apache 2.8 Impala release was not ready, so we had to call it "Impala 2.7" in the version number. Impala 2.8 was officially released after we finished putting together the CDH5.10 release - too late to bump the version in all places. CDH5.10 Impala is almost exactly the same as 2.8, plus or minus a few patches, so in most of the announcements we've just called it 2.8.
... View more
01-31-2017
05:30 PM
3 Kudos
Hi Sanjumani, My guess is that it wasn't able to get enough memory due to other concurrent queries. The query consumed only 160.58MB of memory and I think probably wasn't able to get more. If you have access to the Impala debug web UI, you can look at http://hostname:25000/queries to see what other queries are running on that coordinator, and http://hostname:25000/memz?detailed=true to see what is consuming memory on each host. It's also good to confirm Impala's memory limit setting: you can see "mem_limit" on http://hostname:25000/varz - Tim
... View more
01-31-2017
05:25 PM
We don't support UDFs messing around with Impala's runtime data structures. We don't expose this to UDFs since UDFs aren't really meant to do things like I/O.
... View more
01-30-2017
10:31 AM
Hi Akhil, It is technically possible to read files from an Impala C++ UDF, since we don't sandbox UDFs. However, I would strongly recommend against this because it can lead to resource usage problems and any mistakes can compromise the stability of Impala. My recommendation is to rethink your workflow to avoid the need for this kind of approach if possible. - Tim
... View more
01-27-2017
08:19 AM
That's a good point. I updated the JIRA description to provide that additional motivation. As an open-source project, we're somewhat dependent on people finding time to pick up new features like this that are nice-to-have but not critical for many users.
... View more
01-26-2017
09:53 AM
1 Kudo
This blog post provides a nice introduction to Impala's admission control: https://blog.cloudera.com/blog/2016/12/resource-management-for-apache-impala-incubating/ There are a few ways to inspect a query's memory usage. The query profile and summary will have stats about peak memory usage per host and for each operator in the query. The stats are generally per-host, instead of cluster-wide aggregates. If you're using impala-shell, you can also "set live_summary=1" to get a live update of the query as it makes progress. If you want to see the live state of all queries running on an Impala daemon as an admin, you can look at the "memz" tab of the Impala web UI (by default on port 25000). That will show the full tree of tracked memory from the process down to the operator level. Cloudera Manager also has various charts of aggregate memory usage.
... View more
01-26-2017
09:48 AM
Hi hakki, We haven't implemented that yet. Currently I think the only way to get that info is to run "Show files". - Tim
... View more