Created on 03-21-2017 09:14 AM - edited 09-16-2022 04:17 AM
Hi all,
Has anybody ran into problems when dividing decimal types in Impala?
We run the following query:
select a.var1, a.var2, a.var3,
a.var1 * (1 + (a.var2- a.var3) /1200 ) as problem_statement,
150 * (1 + (5.56 - 0.36) /1200) as raw_values
from
(
select * from table_a
) a,
(select * from table_b
) b
Our results are:
a.var1 a.var2 a.var3 problem_statement raw_values
150 5.56 0.36 150.6499999995 150.65
Even though the problem_statement and raw_value are the same select statement, they get back slightly different values. Because we round some of our variables and call them later in the query, this error is altering our entire query results. The correct value is being returned by the raw_results query.
I will note that the data types for problem_statement and raw_value are different. One is a decimal data type and the other is a double
lv_hmbs_participation_bal decimal(12,2)
grosscoupon decimal(10,6)
lv_hmbs_servicing_fee decimal(10,6)
problem_statement_no_round decimal(30,13)
raw_values double
Has anybody else ran into this issue?
Created 03-21-2017 11:22 AM
I think there are a couple of things going on. You may be running into some of the rounding issues covered by https://issues.apache.org/jira/browse/IMPALA-4810. In Impala 2.9 there will be a DECIMAL_V2 query option that will switch to a decimal mode that addresses some of these problem.
The expression (5.56 - 0.36) / 1200 also is treated as DOUBLE rather than DECIMAL, which is confusing and we plan to fix https://issues.apache.org/jira/browse/IMPALA-3437. If you change 1200 to 1200.0 you'll actually get a DECIMAL result. You can use the typeof() function to inspect the result types of functions:
[localhost:21000] > select typeof((5.56 - 0.36) / 1200), typeof((5.56 - 0.36) / 1200.0); +------------------------------+--------------------------------+ | typeof((5.56 - 0.36) / 1200) | typeof((5.56 - 0.36) / 1200.0) | +------------------------------+--------------------------------+ | DOUBLE | DECIMAL(11,8) | +------------------------------+--------------------------------+ Fetched 1 row(s) in 0.01s
Created 03-21-2017 11:22 AM
I think there are a couple of things going on. You may be running into some of the rounding issues covered by https://issues.apache.org/jira/browse/IMPALA-4810. In Impala 2.9 there will be a DECIMAL_V2 query option that will switch to a decimal mode that addresses some of these problem.
The expression (5.56 - 0.36) / 1200 also is treated as DOUBLE rather than DECIMAL, which is confusing and we plan to fix https://issues.apache.org/jira/browse/IMPALA-3437. If you change 1200 to 1200.0 you'll actually get a DECIMAL result. You can use the typeof() function to inspect the result types of functions:
[localhost:21000] > select typeof((5.56 - 0.36) / 1200), typeof((5.56 - 0.36) / 1200.0); +------------------------------+--------------------------------+ | typeof((5.56 - 0.36) / 1200) | typeof((5.56 - 0.36) / 1200.0) | +------------------------------+--------------------------------+ | DOUBLE | DECIMAL(11,8) | +------------------------------+--------------------------------+ Fetched 1 row(s) in 0.01s
Created 03-21-2017 02:00 PM
I should clarify that DECIMAL_V2 is currently just an experimental flag and the behaviour may change. Likely its behaviour will be in flux until all the subtasks of https://issues.apache.org/jira/browse/IMPALA-4924 are finished.
Created 03-22-2017 06:23 AM
Hi TIm, thank you for your response.
I am currently running CDH 5.8.3 and Impala 2.6. Will Impala 2.9 be included in CDH 5.11?
So it looks like my options are to either use the double type instead of the decimal type, or use 1200.0
I want to also note I am getting this issue in Hive as well as Impala
Created 03-23-2017 09:10 AM
Hi Tim,
I'd like to resond and say that we were running into the issues that you brought up, I will also note that changing our double values of 1200 to 1200.0 does seem to remedy that particular problem.
Thank you for your response.