Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6068 | 12-18-2020 01:46 PM | |
3942 | 12-16-2020 12:11 PM | |
2792 | 12-07-2020 01:47 PM | |
1992 | 12-07-2020 09:21 AM | |
1279 | 10-14-2020 11:15 AM |
07-22-2017
07:36 PM
@sri1993 please look in to my response in this thread . i think its a know issue . http://community.cloudera.com/t5/Cloudera-Manager-Installation/Impala-Catalog-Server-supervisor-permissions/m-p/56710#M11064 let me know if that helps
... View more
07-05-2017
08:38 AM
1 Kudo
I believe the 5.11 RPM should work ok.
... View more
07-02-2017
07:57 PM
I have been re-run the test, and kudu perform much better this time(though it's still a little bit slower than parquet), thanks for @mpercy's suggestion. I changed two things by re-runing the test: 1, increase the partitions for the fact table from 60 to 768(affact all queries) 2, change the query3.sql 'or' predicate into 'in' predicate, so predicate can push down to kudu(only affact query 3) below is the re-run result: (column 'kudu60' is the previous result, which means the partitions of fact table is 60 ) (column 'kudu768' is the new result, which means the partitions of fact table is 768)
... View more
06-27-2017
10:50 AM
1 Kudo
Yes, a lot of people have been hitting this after upgrading their kernels! Thank you for following up and confirming that you were able to fix the problem.
... View more
06-06-2017
03:04 PM
1 Kudo
Hi, Impala unfortunately doesn't support Python UDFs - we have C++ and Java UDF support only. It looks like Impyla had a limited prototype at one point but as far as I know it wasn't ever supported. - Tim
... View more
05-26-2017
10:48 AM
Thank you sir for helping me walk through the profile. This is very informative.
... View more
05-25-2017
02:20 PM
1 Kudo
That query probably has multiple big joins and aggregations and needs more memory to complete. A very rough rule of thumb for minimum memory in releases CDH5.9-CDH5.12 is the following. For each hash join, the minimum of 150MB or the amount of data on the right side of the node (e.g. if you have a few thousand rows on the right side, maybe a MB or two). For each merge aggregation, the minimum of 300MB or the size of grouped data in-memory (e.g. if you only have a few thousand groups, maybe a MB or two). For each sort, about 50-60MB For each analytic, about 20MB If you add all those up and add another 25% you'll get a ballpark number for how much memory the query will require to execute. I'm working on reducing those numbers and making the system give a clearer yes/no answer on whether it can run the query before it starts executing.
... View more
04-24-2017
03:29 PM
Yeap, you're right
... View more
04-18-2017
05:38 PM
On the Impala dev team we do plenty of testing on machines with 16GB-32GB RAM (e.g. my development machine has 32GB RAM). So Impala definitely works with that amount of memory. It's just that with that amount of memory it's not too hard to run into capacity problems if you have a reasonable number of concurrent queries with larger data sizes or more complex queries. It sounds like maybe the smaller memory instances work well for your workload.
... View more