Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6067 | 12-18-2020 01:46 PM | |
3941 | 12-16-2020 12:11 PM | |
2792 | 12-07-2020 01:47 PM | |
1992 | 12-07-2020 09:21 AM | |
1279 | 10-14-2020 11:15 AM |
04-12-2017
09:00 PM
Column a.t_date is a string field, not a timestamp field. The two tables are Parquet file format. By adding more nodes into the cluster, more Impala Daemon are running, can we aspect the performance for such query will be improve?
... View more
03-29-2017
01:19 AM
Unfortunately, the content of this file is under NDA, so I can't provice you the file. Some information that I can give is summarized here: Output from "hdfs dfs -ls": -rwxrwx--x+ 3 hive hive 1093251527 2016-09-30 21:15 /path/to/file/month=12/part-r-00000-be7725db-da77-4a34-a3c6-2e5a9276228c.snappy.parquet We have a _metadata and an _common_metadata file in the same directory (I tried removing them, but this did not resolve the issue) Compression: snappy It was created using: parquet-mr version 1.5.0-cdh5.7.1 (build ${buildNumber}) (output from parquet-tools, version 1.9.0) Software used for creation: Bundled Spark 1.6.0 from CDH 5.7.1 (in the meantime we are using CDH 5.9.0) The file contains 713 row groups The file contains 867 columns (of types int64, double and binary) One further things that I tried is copying the problematic file to a seperate directory (without the two metadata files) create a new table from this file with Impala and do the test here. Unfortunately this produces the exactly same behaviour. When it is cached I get the error message, when it is not cached, everything works fine. Let me know if this helps you in understanding this problem or if you need further information (except from the contents of the file. Thanks a lot already! Kind Regards
... View more
03-23-2017
09:10 AM
Hi Tim, I'd like to resond and say that we were running into the issues that you brought up, I will also note that changing our double values of 1200 to 1200.0 does seem to remedy that particular problem. Thank you for your response.
... View more
03-21-2017
11:29 AM
I think it was probably unable to get enough memory because of other concurrently executing queries. This is somewhat counterintuitive, but if you set the mem_limit query option to an amount of memory that the query can reliably obtain, e.g. 2GB, then when it hits that limit spill-to-disk will kick in and the query should be able to complete (albeit slow than running fully in-memory). We generally recommend that all queries run with a mem_limit set. You can configure a default mem_limit via the "default query options" config or by setting up memory-based admission control. We have some good docs about how to set up memory-based admission control here: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_admission.html#admission_memory We're actively working on improving this so that it's more hands-off.
... View more
03-17-2017
08:53 AM
1 Kudo
This is a bug in the impala-udf-dev package versions 5.9.x to 5.10.x. I was alway intended to be compilable with older versions of gcc. It will be fixed in 5.11+ once that is released. If you downgrade the package to a version 5.8.x or earlier it should also work.
... View more
03-14-2017
11:21 AM
One possible explanation is a crash if there is some problem with the data file. Are there any hs_err_pid*.log files in /var/log/impalad? Or any *.dmp files?
... View more
02-08-2017
05:26 AM
It looks like the problem is really in the timestamp field. Running a similar query on table without timestamp show much better results on the new environment. Thanks for the help
... View more
02-02-2017
10:42 AM
1 Kudo
I think this is related to https://issues.cloudera.org/browse/IMPALA-4610 . I think you already discovered the workaround of using full subqueries.
... View more
01-31-2017
05:45 PM
@Tim Armstrong Thank you very much for your explanation. 🙂 Gatsby
... View more
01-31-2017
05:25 PM
We don't support UDFs messing around with Impala's runtime data structures. We don't expose this to UDFs since UDFs aren't really meant to do things like I/O.
... View more