About Tim Armstrong

Tim Armstrong · ‎03-28-2017

This sounds like it could be a bug then. Is there any way you could provide us with more information about the files that are causing problems so we can try to reproduce it in-house? Having the actual data is ideal, but even information about the file sizes may be helpful.

Tim Armstrong · ‎03-24-2017

I'm not sure that I've seen a problem exactly like that before. That error message occurs it can't read the Parquet file footer correctly. We sometimes see problems like this when overwriting files in-places, because Impala's metadata about file sizes gets out of sync with the actual state of the filesystem. Does your workload involve anything like that? Have you tried running "REFRESH <table>" to force refreshing of the file metadata. Just to check - is the HDFS caching addressing a specific performance problem? Often it's not necessary because the operating system is pretty effective at caching frequently-accessed files.

Tim Armstrong · ‎03-21-2017

I should clarify that DECIMAL_V2 is currently just an experimental flag and the behaviour may change. Likely its behaviour will be in flux until all the subtasks of https://issues.apache.org/jira/browse/IMPALA-4924 are finished.

Tim Armstrong · ‎03-21-2017

I think it was probably unable to get enough memory because of other concurrently executing queries. This is somewhat counterintuitive, but if you set the mem_limit query option to an amount of memory that the query can reliably obtain, e.g. 2GB, then when it hits that limit spill-to-disk will kick in and the query should be able to complete (albeit slow than running fully in-memory). We generally recommend that all queries run with a mem_limit set. You can configure a default mem_limit via the "default query options" config or by setting up memory-based admission control. We have some good docs about how to set up memory-based admission control here: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_admission.html#admission_memory We're actively working on improving this so that it's more hands-off.

Tim Armstrong · ‎03-21-2017

I think there are a couple of things going on. You may be running into some of the rounding issues covered by https://issues.apache.org/jira/browse/IMPALA-4810. In Impala 2.9 there will be a DECIMAL_V2 query option that will switch to a decimal mode that addresses some of these problem. The expression (5.56 - 0.36) / 1200 also is treated as DOUBLE rather than DECIMAL, which is confusing and we plan to fix https://issues.apache.org/jira/browse/IMPALA-3437. If you change 1200 to 1200.0 you'll actually get a DECIMAL result. You can use the typeof() function to inspect the result types of functions: [localhost:21000] > select typeof((5.56 - 0.36) / 1200), typeof((5.56 - 0.36) / 1200.0); +------------------------------+--------------------------------+ | typeof((5.56 - 0.36) / 1200) | typeof((5.56 - 0.36) / 1200.0) | +------------------------------+--------------------------------+ | DOUBLE | DECIMAL(11,8) | +------------------------------+--------------------------------+ Fetched 1 row(s) in 0.01s

Tim Armstrong · ‎03-17-2017

This is a bug in the impala-udf-dev package versions 5.9.x to 5.10.x. I was alway intended to be compilable with older versions of gcc. It will be fixed in 5.11+ once that is released. If you downgrade the package to a version 5.8.x or earlier it should also work.

Tim Armstrong · ‎03-14-2017

One possible explanation is a crash if there is some problem with the data file. Are there any hs_err_pid*.log files in /var/log/impalad? Or any *.dmp files?

Tim Armstrong · ‎02-08-2017

There may also be more details about the "memory limit exceeded" error in the log on a different Impala daemon where it did run out of memory. I should also mention that we've been making a big push to improve the memory consumption and general performance of partitioned inserts - currently with dynamic partitioning memory consumption is very high with a large number of partitions. See https://issues.cloudera.org/browse/IMPALA-2522

Tim Armstrong · ‎02-08-2017

I think the key question is why the effective process memory limit is 1GB. In the error you pasted above it says the process limit is 1.00GB: "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB" You can look at the memory limits and consumption on an Impala daemon's debug page at http://hostname:25000/memz?detailed=true - that may help identify any problematic Impala daemons. Even if you're setting the memory limit to 100GB, then on startup the Impala daemon does some additional checks for the amount of memory available. You can look for a couple of things in the logs: 1. If vm overcommit and swapping are both disabled you may see a message in the WARNING and INFO logs along the lines of "This system shows a discrepancy between the available memory and the memory commit limit ..." that will explain what the effective memory limit is 2. Impala's view of the amount of physical memory available, e.g. "I0208 14:54:02.425094 4482 init.cc:220] Physical Memory: 31.33 GB"

Tim Armstrong · ‎02-03-2017

It's not entirely obvious to me what changed. Since it's grouping by timestamp, you're running into https://issues.cloudera.org/browse/IMPALA-3884, which is forcing it to execute the aggregation in an interpreted mode. I suspect that the performance of that interpreted code path may have changed a bit - we tend to focus most of our tuning on the codegened code path. On releases with IMPALA-3884 fixed it should be a *lot* faster. " ExecOption: Codegen Disabled: HashTableCtx::CodegenEvalRow(): type TIMESTAMP NYI"

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Impala not working with some Parquet files whe...

Re: Impala not working with some Parquet files whe...

Re: Decimal Data Type returning slightly incorrect...

Re: Impala query failed on memory limit

Re: Decimal Data Type returning slightly incorrect...

Re: Is there an Impala UDF C++ dev package for Red...

Re: Connection reset by peer when query specific t...

Re: "Memory Limit Exceeded" error on Impala when i...

Re: "Memory Limit Exceeded" error on Impala when i...

Re: Impala - performance degradation between 2.6 a...