Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6055 | 12-18-2020 01:46 PM | |
3932 | 12-16-2020 12:11 PM | |
2787 | 12-07-2020 01:47 PM | |
1989 | 12-07-2020 09:21 AM | |
1277 | 10-14-2020 11:15 AM |
01-30-2017
12:46 AM
thanks for the solution, It works!!
... View more
01-29-2017
05:33 AM
Yes, I thought partitioning on the pattern of filename. For one type of data I have, it is acceptable since there are only 10-15 different types of file name. However; in another type of data, there is a unique "id" field on filename and this field's frequency is very high. So, there would be the threat of creation of lots of partitions which could be compelling for the catalog server.
... View more
01-26-2017
09:53 AM
1 Kudo
This blog post provides a nice introduction to Impala's admission control: https://blog.cloudera.com/blog/2016/12/resource-management-for-apache-impala-incubating/ There are a few ways to inspect a query's memory usage. The query profile and summary will have stats about peak memory usage per host and for each operator in the query. The stats are generally per-host, instead of cluster-wide aggregates. If you're using impala-shell, you can also "set live_summary=1" to get a live update of the query as it makes progress. If you want to see the live state of all queries running on an Impala daemon as an admin, you can look at the "memz" tab of the Impala web UI (by default on port 25000). That will show the full tree of tracked memory from the process down to the operator level. Cloudera Manager also has various charts of aggregate memory usage.
... View more
01-12-2017
04:24 PM
Given the size of the dataset, I believe the data fits in memory and its not providing any additional performance improvement. Thanks!
... View more
01-11-2017
12:51 PM
Thank you very much for your very quick and useful answer! Glad to know that it is a known bug. Anyway, I have been playing around, and made it work! For some reason it does work. Have no idea why though.. select
a+b+c+d segment
,min(r)
,max(r)
,count(ANYID) COUNTER
from (
select
ANYID
,r
,if(r >= 0.005,1,0) a
,if(r >= 0.0175,1,0) b
,if(r >= 0.08,1,0) c
,if(r >= 0.025,1,0) d
from (
select ANYID,RAND(unix_timestamp()) r from ANYTABLE
) foo
) bar
group by segment
order by segment
Do you have any idea why It does work? 🙂 Thanks Andras
... View more
01-10-2017
11:34 AM
Hi Tim, thank you for taking the time to look at this issue! Br, Petter
... View more
12-30-2016
04:42 AM
Thanks Tim, It was really helpful.
... View more
12-08-2016
07:12 AM
increased catalog server heap resolved this problem.
... View more
11-23-2016
03:32 PM
1 Kudo
We had an issue filed for this a while back: https://issues.cloudera.org/browse/IMPALA-3293 . It seems fairly reasonable but I think will depend on how much demand there is for it (or if someone contributes a patch for it).
... View more
11-18-2016
06:03 PM
We added support for --ldap_password_cmd in Impala 2.5, which I think addresses this problem. See https://issues.cloudera.org/browse/IMPALA-1934 https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_shell_options.html
... View more