Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4366 | 12-18-2020 01:46 PM | |
2703 | 12-16-2020 12:11 PM | |
1794 | 12-07-2020 01:47 PM | |
1434 | 12-07-2020 09:21 AM | |
925 | 10-14-2020 11:15 AM |
12-14-2018
06:11 AM
The same can be peformed in hive using concat_ws('.',from_unixtime(cast(epochmillis/1000 as BIGINT),'yyyy-MM-dd HH:mm:ss'),cast(floor(epochmillis % 1000) as STRING)) to get the timestamp with milliseconds. Is this efficient way of doing it ?
... View more
12-08-2018
10:25 AM
Actually, scratch what I just said - that advice applies if the query is stuck in the FINISHED state. If it's stuck in the RUNNING state, it means the query is just taking a long time to produce any results. So you're probably getting a bad query plan on one cluster that is extremely slow to execute. E.g. the order of the joins chosen by the planner is inefficient. Usually computing stats on all the tables will improve the query plan.
... View more
11-26-2018
05:29 PM
CDH5.10.2 should have the fix for that specific issue.
... View more
11-19-2018
02:24 PM
Hi @scuffster There are some interesting issues here with the different numeric data types here - INT, DOUBLE, DECIMAL, etc. The behaviour you're seeing is because the first input to round() is a DOUBLE expression, which cannot exactly represent all decimal values. Generally the output type of the round() function is the same as the input type. Impala does support precise decimal arithmetic with the DECIMAL type. If you are operating on DECIMAL columns or you cast the input to a decimal type with the right precision and scale, you may get the behaviour you're hoping for. Here's a query showing the type of your expressions and an alternative version with a cast to DECIMAL: > select typeof(269586/334026 * 100), typeof(round(269586/334026 * 100, 2)), round(269586/334026 * 100, 2), round(cast(269586/334026 * 100 as DECIMAL(20, 8)), 2);
+-------------------------------+-----------------------------------------+---------------------------------+--------------------------------------------------------+
| typeof(269586 / 334026 * 100) | typeof(round(269586 / 334026 * 100, 2)) | round(269586 / 334026 * 100, 2) | round(cast(269586 / 334026 * 100 as decimal(20,8)), 2) |
+-------------------------------+-----------------------------------------+---------------------------------+--------------------------------------------------------+
| DOUBLE | DOUBLE | 80.70999999999999 | 80.71 |
+-------------------------------+-----------------------------------------+---------------------------------+--------------------------------------------------------+
... View more
11-19-2018
12:31 AM
Is there a workaround for this as we are on Impala version 2.8.0. We are always stuck with compute incremental stats queries that need tobe manually cancelled?
... View more
10-26-2018
03:29 AM
I have checked the writer in the file's metadata, and it is Parquet.Net version 2.1.4.298. So it seems that this is not an Impala reader issue, but a Parquet.Net writer issue. The definition levels of NULLs in collections are wrong (according to Parquet spec). This issue it causes is that if the first column read is the collection with NULL in the row, then the 0 def level is interpreted as "the whole row is NULL". If there is another (non NULL) column read first, then its def will be used to determine parents's NULLness, so it will not be NULL. This is why adding 'id' leads to returning the expected results. I would not consider this a bug, rather an optimisation (checking every columns's def level could affect performance). Parquet.Net is not part of CDH and is not an Apachee project at the moment. I am not familiar with the project, so I do not know whether this is a known issue or not. My advice is to contact the maintainer mentioned at https://github.com/elastacloud/parquet-dotnet
... View more
10-10-2018
01:30 PM
The MEM_LIMIT is a hard limit on the amount of memory that can be used by the query and cannot be re-negotiated during execution. If the default mem_limit that you set does not suffice, you can either increase it OR you can set the mem_limit query option to a higher value only for that query.
... View more
10-04-2018
11:26 PM
1 Kudo
I have figured out that this is coming from the third party tool, so it has nothing to do with the Simba driver. Thanks
... View more
10-03-2018
10:55 AM
1 Kudo
I think the Kudu min-max filter pushdown optimisation in C5.14+ would achieve this: https://issues.apache.org/jira/browse/IMPALA-4252
... View more
09-21-2018
01:08 AM
I would like this too. My use case is new data files written to existing partitions, so I'm not concerned with partition discovery. Even having REFRESH tabA
PARTITION (...)
PARTITION (...)
PARTITION (...) will be useful
... View more