About Tim Armstrong

Tim Armstrong · ‎02-02-2017

I think this is related to https://issues.cloudera.org/browse/IMPALA-4610 . I think you already discovered the workaround of using full subqueries.

Tim Armstrong · ‎02-02-2017

If you want to experiment you could try setting the query options "PREFETCH_MODE=0" and "DISABLE_STREAMING_PREAGGREGATIONS=1". You could also try adjusting NUM_SCANNER_THREADS. With the last option, we changed our recommendation recently to set it to the # of logical cores on the machine (it was 3x logical cores originally - we changed the default in a later release). You can set the default via the "default query options" impalad startup option.

Tim Armstrong · ‎02-02-2017

Can you share the explain plan and the query? I'm interested to know what data types and aggregate functions you're using. We're not aware of any performance regressions between those versions - we actually made a number of improvements, e.g. changing the hash table implementation and adding software prefetching. It's possible that your workload hit one of the edge cases where these changes had a detrimental effect.

Tim Armstrong · ‎01-31-2017

There's a bit of a story there. When we started preparing the 5.10 CDH release, the Apache 2.8 Impala release was not ready, so we had to call it "Impala 2.7" in the version number. Impala 2.8 was officially released after we finished putting together the CDH5.10 release - too late to bump the version in all places. CDH5.10 Impala is almost exactly the same as 2.8, plus or minus a few patches, so in most of the announcements we've just called it 2.8.

Tim Armstrong · ‎01-31-2017

Hi Sanjumani, My guess is that it wasn't able to get enough memory due to other concurrent queries. The query consumed only 160.58MB of memory and I think probably wasn't able to get more. If you have access to the Impala debug web UI, you can look at http://hostname:25000/queries to see what other queries are running on that coordinator, and http://hostname:25000/memz?detailed=true to see what is consuming memory on each host. It's also good to confirm Impala's memory limit setting: you can see "mem_limit" on http://hostname:25000/varz - Tim

Tim Armstrong · ‎01-31-2017

We don't support UDFs messing around with Impala's runtime data structures. We don't expose this to UDFs since UDFs aren't really meant to do things like I/O.

Tim Armstrong · ‎01-30-2017

Hi Akhil, It is technically possible to read files from an Impala C++ UDF, since we don't sandbox UDFs. However, I would strongly recommend against this because it can lead to resource usage problems and any mistakes can compromise the stability of Impala. My recommendation is to rethink your workflow to avoid the need for this kind of approach if possible. - Tim

Tim Armstrong · ‎01-27-2017

That's a good point. I updated the JIRA description to provide that additional motivation. As an open-source project, we're somewhat dependent on people finding time to pick up new features like this that are nice-to-have but not critical for many users.

Tim Armstrong · ‎01-26-2017

This blog post provides a nice introduction to Impala's admission control: https://blog.cloudera.com/blog/2016/12/resource-management-for-apache-impala-incubating/ There are a few ways to inspect a query's memory usage. The query profile and summary will have stats about peak memory usage per host and for each operator in the query. The stats are generally per-host, instead of cluster-wide aggregates. If you're using impala-shell, you can also "set live_summary=1" to get a live update of the query as it makes progress. If you want to see the live state of all queries running on an Impala daemon as an admin, you can look at the "memz" tab of the Impala web UI (by default on port 25000). That will show the full tree of tracked memory from the process down to the operator level. Cloudera Manager also has various charts of aggregate memory usage.

Tim Armstrong · ‎01-26-2017

Hi hakki, We haven't implemented that yet. Currently I think the only way to get that info is to run "Show files". - Tim

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Right deep join SQL syntax

Re: Impala - performance degradation between 2.6 a...

Re: Impala - performance degradation between 2.6 a...

Re: CDH5.10 redhat repo doesn't have Impala 2.8 RP...

Re: impala memory limit exceed

Re: Is there is any method to read property config...

Re: Is there is any method to read property config...

Re: Impala support for INPUTFILENAME in hive

Re: Impala query memory usage

Re: Impala support for INPUTFILENAME in hive