About Tim Armstrong

csguna · ‎07-22-2017

@sri1993 please look in to my response in this thread . i think its a know issue . http://community.cloudera.com/t5/Cloudera-Manager-Installation/Impala-Catalog-Server-supervisor-permissions/m-p/56710#M11064 let me know if that helps

Tim Armstrong · ‎07-05-2017

I believe the 5.11 RPM should work ok.

lewiss · ‎07-02-2017

I have been re-run the test, and kudu perform much better this time(though it's still a little bit slower than parquet), thanks for @mpercy's suggestion. I changed two things by re-runing the test: 1, increase the partitions for the fact table from 60 to 768(affact all queries) 2, change the query3.sql 'or' predicate into 'in' predicate, so predicate can push down to kudu(only affact query 3) below is the re-run result: (column 'kudu60' is the previous result, which means the partitions of fact table is 60 ) (column 'kudu768' is the new result, which means the partitions of fact table is 768)

Tim Armstrong · ‎06-27-2017

Yes, a lot of people have been hitting this after upgrading their kernels! Thank you for following up and confirming that you were able to fix the problem.

Tim Armstrong · ‎06-06-2017

Hi, Impala unfortunately doesn't support Python UDFs - we have C++ and Java UDF support only. It looks like Impyla had a limited prototype at one point but as far as I know it wasn't ever supported. - Tim

Beebeegun · ‎05-26-2017

Thank you sir for helping me walk through the profile. This is very informative.

Tim Armstrong · ‎05-25-2017

That query probably has multiple big joins and aggregations and needs more memory to complete. A very rough rule of thumb for minimum memory in releases CDH5.9-CDH5.12 is the following. For each hash join, the minimum of 150MB or the amount of data on the right side of the node (e.g. if you have a few thousand rows on the right side, maybe a MB or two). For each merge aggregation, the minimum of 300MB or the size of grouped data in-memory (e.g. if you only have a few thousand groups, maybe a MB or two). For each sort, about 50-60MB For each analytic, about 20MB If you add all those up and add another 25% you'll get a ballpark number for how much memory the query will require to execute. I'm working on reducing those numbers and making the system give a clearer yes/no answer on whether it can run the query before it starts executing.

thewayofthinkin · ‎04-24-2017

Yeap, you're right

alex.behm · ‎04-21-2017

https://issues.apache.org/jira/browse/IMPALA-5243

Tim Armstrong · ‎04-18-2017

On the Impala dev team we do plenty of testing on machines with 16GB-32GB RAM (e.g. my development machine has 32GB RAM). So Impala definitely works with that amount of memory. It's just that with that amount of memory it's not too hard to run into capacity problems if you have a reasonable number of concurrent queries with larger data sizes or more complex queries. It sounds like maybe the smaller memory instances work well for your workload.

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	140

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Impala daemon fail to start CDH 5.11.1 - unabl...

Re: UDF Problem: unresolvable relocation of 'ZNSs4...

Re: kudu is slower than parquet?

Re: Segmentation fault (core dumped) on Impalad an...

Re: Steps to write python UDF for Impala?

Re: Imapla query is slow, where is the bottleneck?

Re: impala memory limit exceed

Re: Need help with Impala 2.8 on CDH 5.10 upgrade

Re: CodeGen way to slow

Re: Low memory Impala nodes (e.g. 15GB RAM)