About Tim Armstrong

Tim Armstrong · ‎11-16-2016

If you're using impala-shell, you can use the "summary;" command. Otherwise it's accessible through the Impala debug web pages (typically http://the-impala-server:25000)

mauricio · ‎11-15-2016

I've opened IMPALA-4492. Thanks Tim. -m

Tim Armstrong · ‎10-13-2016

Good to hear! Please feel free to mark it as solved to make it easier for others to find.

Tim Armstrong · ‎09-27-2016

Thanks for the data point :). We're tracking the parallelisation work here: https://issues.cloudera.org/browse/IMPALA-3902 . It's probably going to get enabled in phases - we may have parallelisation for aggregations before joins for example.

Tim Armstrong · ‎08-15-2016

This is a known issue that we're actively working on: https://issues.cloudera.org/browse/IMPALA-2567 Your analysis is accurate. Part of the problem is the number of connections and the other part is the # of threads per connection. You may be able to change some operating system config settings to increase limits here (depending on which limit you're hitting). In order to reduce the # of tcp conncetions required you would either need to reduce the number of fragments or reduce the number of node executing the query. You could reduce the # of fragments by breaking up the query into smaller queries. E.g. creating temporary tables with the results of some of the subqueries. You could also try executing the query on a single node by setting num_nodes=1 if the data size is small enough that this makes sense. I suspect your query is too large for that to work, but it's hard to tell (that's a huge query plan!)

Tim Armstrong · ‎08-05-2016

I'm not the most knowledgeable person about this part of the code, but what you're saying is correct. One of the likely causes of long wait times is if the receiver is consuming data slower than the sender is sending it.

Matt Jacobs · ‎06-30-2016

We're working on it and it should be in the next release, but unfortunately we're not aware of a good way to simulate the same results.

Tim Armstrong · ‎06-28-2016

The only way to do this with zero work would be to use a view. http://www.cloudera.com/documentation/enterprise/latest/topics/impala_create_view.html Otherwise you do have to run the queries as part of your data pipeline as you mentioned.

Tim Armstrong · ‎06-24-2016

That probably makes sense if the bottleneck is evaluating the where clause. If those extra rows are filtered out in the join, then the gain is limited, since you should filter out the extra rows during the scan or when evaluating the simple join condition. Our scans are multithreaded too, so sometimes if the join is the bottleneck, making the scans do more work doesn't slow down the query overall.

manchamp · ‎06-17-2016

Thank you for explaining it. Its a function call. Changing it and will see the impact. Will come back with results...

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: impala memory limit exceed

Re: Help diagnosing slow query (even though fast h...

Re: Precision of DoubleVal calculations in udf

Re: AGGREGATE of query is to long

Re: Single Query (with 253 Plan Fragments) Causes ...

Re: total_network_send_timer and thrift_transmit_t...

Re: Forward filling and back filling from adjacent...

Re: How to create impala derived tables

Re: Difference between these profiles to explain t...

Re: Improve impala execution rate?