About Tim Armstrong

Tim Armstrong · ‎11-27-2018

Also, a more general tip is that you can set a default value for *any* query option via the dynamic resource pool interface. You can have different values per pool and you can change the values without a cluster restart - you only need to change the config and hit "Refresh" to push out the changes for new queries.

Tim Armstrong · ‎11-27-2018

@alexmc6I'd recommend setting a default memory limit for all of your resource pools. See https://www.cloudera.com/documentation/enterprise/latest/topics/impala_howto_rm.html#concept_en4_3sy_pw for how to get to that page in CM. Note that if you set "Max Memory", it will enable memory-based admission control, which is stricter - it won't admit queries if their memory limits add up to more than the available memory. If you leave "Max Memory" unset, memory-based admission control remains disabled but the memory limit provides some protection against runaway queries, which I think is the incremental step you're looking for.

Tim Armstrong · ‎11-26-2018

CDH5.10.2 should have the fix for that specific issue.

Tim Armstrong · ‎11-19-2018

Hi @scuffster There are some interesting issues here with the different numeric data types here - INT, DOUBLE, DECIMAL, etc. The behaviour you're seeing is because the first input to round() is a DOUBLE expression, which cannot exactly represent all decimal values. Generally the output type of the round() function is the same as the input type. Impala does support precise decimal arithmetic with the DECIMAL type. If you are operating on DECIMAL columns or you cast the input to a decimal type with the right precision and scale, you may get the behaviour you're hoping for. Here's a query showing the type of your expressions and an alternative version with a cast to DECIMAL: > select typeof(269586/334026 * 100), typeof(round(269586/334026 * 100, 2)), round(269586/334026 * 100, 2), round(cast(269586/334026 * 100 as DECIMAL(20, 8)), 2); +-------------------------------+-----------------------------------------+---------------------------------+--------------------------------------------------------+ | typeof(269586 / 334026 * 100) | typeof(round(269586 / 334026 * 100, 2)) | round(269586 / 334026 * 100, 2) | round(cast(269586 / 334026 * 100 as decimal(20,8)), 2) | +-------------------------------+-----------------------------------------+---------------------------------+--------------------------------------------------------+ | DOUBLE | DOUBLE | 80.70999999999999 | 80.71 | +-------------------------------+-----------------------------------------+---------------------------------+--------------------------------------------------------+

Tim Armstrong · ‎11-15-2018

If you set "Default Pool Memory Limit" and *don't* set the "Maximum Memory" for a resource pool you'll get the effect you're looking for - the memory consumption per query is limited but it's not reserved. So that does help with the "single runaway query" scenario you're talking about. The downside of that configuration is that you don't really have a guarantee that any query gets the memory its entitled to - you could exhaust memory with multiple concurrent large queries. I could go on about this stuff a lot more. We have a bunch of improvements in the pipeline - the 6.1 release has some work slated that gives you more flexibility in how much memory queries of different sizes get instead of the one-size-fits all memory limit.

Tim Armstrong · ‎11-15-2018

Yes the YARN reference is a documentation error that we've fixed since then - sorry about that.

Tim Armstrong · ‎10-04-2018

It might be an application incorrectly trying to set the SIMBA driver option by SQL (which won't work). AFAIK the Simba JDBC driver won't do this itself.

Tim Armstrong · ‎10-03-2018

I think the Kudu min-max filter pushdown optimisation in C5.14+ would achieve this: https://issues.apache.org/jira/browse/IMPALA-4252

Tim Armstrong · ‎09-24-2018

Hi @nitinagr. You pretty much have two options here right now: * Use a constant second argument * Cast the first input to DOUBLE (which means you may lose some precision if the values can't be exactly represented in a DOUBLE type). I think supporting this seems like a valid ask so I went ahead and filed a JIRA: https://issues.apache.org/jira/browse/IMPALA-7613 . I remember a while back debating with someone about whether users were likely to do data-driven rounding, so I think you just settled that argument!

Tim Armstrong · ‎09-20-2018

That's interesting - those tools ultimately must go through the Java API but I wonder if they're using different APIs or something. Presumably the bug isn't in the tools themselves.

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Impala - Memory limit exceeded

Re: Impala - Memory limit exceeded

Re: Jar files created when invalidate metadata is ...

Re: Impala round function does not return expected...

Re: Protecting queries in Impala using pools and m...

Re: Protecting queries in Impala using pools and m...

Re: Impala unsupported set commands

Re: Impala hash join optimization

Re: Impala round function does not return expected...

Re: JDBC/ODBC: regexp_* functions' behavior depend...