Support Questions

Find answers, ask questions, and share your expertise

Wrong results dividing decimal by integer

avatar
Explorer

Test Case

Setup

 

 

create table n1 (id STRING)
stored as parquet;

insert into n1 values ('-1234');
insert into n1 values ('7890');

 

 

Results

impalad version 2.12.0-cdh5.15.2

 

 

SELECT  `id`
,       CAST(`id` AS DECIMAL(38,0)) / 20 * 20 AS D0 
,       CAST(`id` AS DECIMAL(38,0)) / 20.0 * 20 AS D1
FROM    n1
ORDER BY `id`;

+----------+-------+---------+
| id       | d0    | d1      |
+----------+-------+---------+
| -1234    | -1220 | -1234.0 |
| 7890     | 7880  | 7890.0  |
+----------+-------+---------+
Fetched 4 row(s) in 0.56s

 

 

When we are casting the "id" column to a decimal without scale and dividing by an integer, even though the answer should have a scale (-1234 / 20 = -61.7) we lose the scale and the result is truncated to -61. This is then multiplied by 20 to give -1220 which is column d0. This does not get us back to the same value that we started with which was -1234.

 

When we divide by a decimal value (20.0) the result of -61.7 is preserved, which gets us back to the correct value of the "id" in column d1.

 

impalad version 3.2.0-cdh6.2.0

 

SELECT  `id`
,       CAST(`id` AS DECIMAL(38,0)) / 20 * 20 AS D0 
,       CAST(`id` AS DECIMAL(38,0)) / 20.0 * 20 AS D1
FROM    n1
ORDER BY `id`;

+-------+--------------+--------------+
| id    | d0           | d1           |
+-------+--------------+--------------+
| -1234 | -1234.000000 | -1234.000000 |
| 7890  | 7890.000000  | 7890.000000  |
+-------+--------------+--------------+

 

Notice the change in behaviour with this later version of Impala.

 

This looks to me like a bug that has been fixed.

 

Could you please confirm and point me to the IMPALA Jira that explains this bug?

 

1 ACCEPTED SOLUTION

avatar

We did a wholesale revamp of decimal behaviour going from CDH5 to CDH6. The default behaviour all changed in CDH6.0: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_new_features.ht...

 

There's a whole epic JIRA capturing the changes: https://issues.apache.org/jira/browse/IMPALA-4072 . I think https://issues.apache.org/jira/browse/IMPALA-4370 might be the specific fix that you're seeing, based on your analysis. The fix version for that change is Impala 2.9.0, so the code change is in CDH5.15.2, but it was done behind the DECIMAL_V2 query option, which wasn't a supported option until CDH6.

 

IN CDH6 you can toggle the behaviour with the DECIMAL_V2 query option (it will eventually be removed, but was kept for backward compatibility).

View solution in original post

1 REPLY 1

avatar

We did a wholesale revamp of decimal behaviour going from CDH5 to CDH6. The default behaviour all changed in CDH6.0: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_new_features.ht...

 

There's a whole epic JIRA capturing the changes: https://issues.apache.org/jira/browse/IMPALA-4072 . I think https://issues.apache.org/jira/browse/IMPALA-4370 might be the specific fix that you're seeing, based on your analysis. The fix version for that change is Impala 2.9.0, so the code change is in CDH5.15.2, but it was done behind the DECIMAL_V2 query option, which wasn't a supported option until CDH6.

 

IN CDH6 you can toggle the behaviour with the DECIMAL_V2 query option (it will eventually be removed, but was kept for backward compatibility).