About Tim Armstrong

AkhilTM · ‎01-30-2017

thanks for the solution, It works!!

hakki · ‎01-29-2017

Yes, I thought partitioning on the pattern of filename. For one type of data I have, it is acceptable since there are only 10-15 different types of file name. However; in another type of data, there is a unique "id" field on filename and this field's frequency is very high. So, there would be the threat of creation of lots of partitions which could be compelling for the catalog server.

Tim Armstrong · ‎01-26-2017

This blog post provides a nice introduction to Impala's admission control: https://blog.cloudera.com/blog/2016/12/resource-management-for-apache-impala-incubating/ There are a few ways to inspect a query's memory usage. The query profile and summary will have stats about peak memory usage per host and for each operator in the query. The stats are generally per-host, instead of cluster-wide aggregates. If you're using impala-shell, you can also "set live_summary=1" to get a live update of the query as it makes progress. If you want to see the live state of all queries running on an Impala daemon as an admin, you can look at the "memz" tab of the Impala web UI (by default on port 25000). That will show the full tree of tracked memory from the process down to the operator level. Cloudera Manager also has various charts of aggregate memory usage.

buntu · ‎01-12-2017

Given the size of the dataset, I believe the data fits in memory and its not providing any additional performance improvement. Thanks!

AndrasK · ‎01-11-2017

Thank you very much for your very quick and useful answer! Glad to know that it is a known bug. Anyway, I have been playing around, and made it work! For some reason it does work. Have no idea why though.. select a+b+c+d segment ,min(r) ,max(r) ,count(ANYID) COUNTER from ( select ANYID ,r ,if(r >= 0.005,1,0) a ,if(r >= 0.0175,1,0) b ,if(r >= 0.08,1,0) c ,if(r >= 0.025,1,0) d from ( select ANYID,RAND(unix_timestamp()) r from ANYTABLE ) foo ) bar group by segment order by segment Do you have any idea why It does work? 🙂 Thanks Andras

Pettax · ‎01-10-2017

Hi Tim, thank you for taking the time to look at this issue! Br, Petter

RPAT · ‎12-30-2016

Thanks Tim, It was really helpful.

ScottChris · ‎12-08-2016

increased catalog server heap resolved this problem.

Tim Armstrong · ‎11-23-2016

We had an issue filed for this a while back: https://issues.cloudera.org/browse/IMPALA-3293 . It seems fairly reasonable but I think will depend on how much demand there is for it (or if someone contributes a patch for it).

Tim Armstrong · ‎11-18-2016

We added support for --ldap_password_cmd in Impala 2.5, which I think addresses this problem. See https://issues.cloudera.org/browse/IMPALA-1934 https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_shell_options.html

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	140

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: How to create impala UDF common to all databas...

Re: Impala support for INPUTFILENAME in hive

Re: Impala query memory usage

Re: Impala performance with HDFS caching enabled

Re: Impala RANDOM, cases

Re: Impala has problems reading complex types from...

Re: In Impala UDF - While fetching NULL record St...

Re: error processing the impalad catalog update. R...

Re: What heuristics does Impala use for cardinalit...

Re: impala-shell login password