Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7604 | 12-18-2020 01:46 PM | |
4984 | 12-16-2020 12:11 PM | |
3792 | 12-07-2020 01:47 PM | |
2471 | 12-07-2020 09:21 AM | |
1613 | 10-14-2020 11:15 AM |
03-28-2017
10:16 AM
This sounds like it could be a bug then. Is there any way you could provide us with more information about the files that are causing problems so we can try to reproduce it in-house? Having the actual data is ideal, but even information about the file sizes may be helpful.
... View more
03-24-2017
01:34 PM
I'm not sure that I've seen a problem exactly like that before. That error message occurs it can't read the Parquet file footer correctly. We sometimes see problems like this when overwriting files in-places, because Impala's metadata about file sizes gets out of sync with the actual state of the filesystem. Does your workload involve anything like that? Have you tried running "REFRESH <table>" to force refreshing of the file metadata. Just to check - is the HDFS caching addressing a specific performance problem? Often it's not necessary because the operating system is pretty effective at caching frequently-accessed files.
... View more
03-21-2017
02:00 PM
I should clarify that DECIMAL_V2 is currently just an experimental flag and the behaviour may change. Likely its behaviour will be in flux until all the subtasks of https://issues.apache.org/jira/browse/IMPALA-4924 are finished.
... View more
03-21-2017
11:29 AM
I think it was probably unable to get enough memory because of other concurrently executing queries. This is somewhat counterintuitive, but if you set the mem_limit query option to an amount of memory that the query can reliably obtain, e.g. 2GB, then when it hits that limit spill-to-disk will kick in and the query should be able to complete (albeit slow than running fully in-memory). We generally recommend that all queries run with a mem_limit set. You can configure a default mem_limit via the "default query options" config or by setting up memory-based admission control. We have some good docs about how to set up memory-based admission control here: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_admission.html#admission_memory We're actively working on improving this so that it's more hands-off.
... View more
03-21-2017
11:22 AM
I think there are a couple of things going on. You may be running into some of the rounding issues covered by https://issues.apache.org/jira/browse/IMPALA-4810. In Impala 2.9 there will be a DECIMAL_V2 query option that will switch to a decimal mode that addresses some of these problem. The expression (5.56 - 0.36) / 1200 also is treated as DOUBLE rather than DECIMAL, which is confusing and we plan to fix https://issues.apache.org/jira/browse/IMPALA-3437. If you change 1200 to 1200.0 you'll actually get a DECIMAL result. You can use the typeof() function to inspect the result types of functions: [localhost:21000] > select typeof((5.56 - 0.36) / 1200), typeof((5.56 - 0.36) / 1200.0);
+------------------------------+--------------------------------+
| typeof((5.56 - 0.36) / 1200) | typeof((5.56 - 0.36) / 1200.0) |
+------------------------------+--------------------------------+
| DOUBLE | DECIMAL(11,8) |
+------------------------------+--------------------------------+
Fetched 1 row(s) in 0.01s
... View more
03-17-2017
08:53 AM
1 Kudo
This is a bug in the impala-udf-dev package versions 5.9.x to 5.10.x. I was alway intended to be compilable with older versions of gcc. It will be fixed in 5.11+ once that is released. If you downgrade the package to a version 5.8.x or earlier it should also work.
... View more
03-14-2017
11:21 AM
One possible explanation is a crash if there is some problem with the data file. Are there any hs_err_pid*.log files in /var/log/impalad? Or any *.dmp files?
... View more
02-08-2017
03:42 PM
There may also be more details about the "memory limit exceeded" error in the log on a different Impala daemon where it did run out of memory. I should also mention that we've been making a big push to improve the memory consumption and general performance of partitioned inserts - currently with dynamic partitioning memory consumption is very high with a large number of partitions. See https://issues.cloudera.org/browse/IMPALA-2522
... View more
02-08-2017
03:39 PM
I think the key question is why the effective process memory limit is 1GB. In the error you pasted above it says the process limit is 1.00GB: "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB" You can look at the memory limits and consumption on an Impala daemon's debug page at http://hostname:25000/memz?detailed=true - that may help identify any problematic Impala daemons. Even if you're setting the memory limit to 100GB, then on startup the Impala daemon does some additional checks for the amount of memory available. You can look for a couple of things in the logs: 1. If vm overcommit and swapping are both disabled you may see a message in the WARNING and INFO logs along the lines of "This system shows a discrepancy between the available memory and the memory commit limit ..." that will explain what the effective memory limit is 2. Impala's view of the amount of physical memory available, e.g. "I0208 14:54:02.425094 4482 init.cc:220] Physical Memory: 31.33 GB"
... View more
02-03-2017
03:04 PM
2 Kudos
It's not entirely obvious to me what changed. Since it's grouping by timestamp, you're running into https://issues.cloudera.org/browse/IMPALA-3884, which is forcing it to execute the aggregation in an interpreted mode. I suspect that the performance of that interpreted code path may have changed a bit - we tend to focus most of our tuning on the codegened code path. On releases with IMPALA-3884 fixed it should be a *lot* faster. " ExecOption: Codegen Disabled: HashTableCtx::CodegenEvalRow(): type TIMESTAMP NYI"
... View more