Created on 04-21-2020 12:55 PM - edited 04-21-2020 12:56 PM
create table n1 (id STRING)
stored as parquet;
insert into n1 values ('-1234');
insert into n1 values ('7890');
SELECT `id`
, CAST(`id` AS DECIMAL(38,0)) / 20 * 20 AS D0
, CAST(`id` AS DECIMAL(38,0)) / 20.0 * 20 AS D1
FROM n1
ORDER BY `id`;
+----------+-------+---------+
| id | d0 | d1 |
+----------+-------+---------+
| -1234 | -1220 | -1234.0 |
| 7890 | 7880 | 7890.0 |
+----------+-------+---------+
Fetched 4 row(s) in 0.56s
When we are casting the "id" column to a decimal without scale and dividing by an integer, even though the answer should have a scale (-1234 / 20 = -61.7) we lose the scale and the result is truncated to -61. This is then multiplied by 20 to give -1220 which is column d0. This does not get us back to the same value that we started with which was -1234.
When we divide by a decimal value (20.0) the result of -61.7 is preserved, which gets us back to the correct value of the "id" in column d1.
SELECT `id`
, CAST(`id` AS DECIMAL(38,0)) / 20 * 20 AS D0
, CAST(`id` AS DECIMAL(38,0)) / 20.0 * 20 AS D1
FROM n1
ORDER BY `id`;
+-------+--------------+--------------+
| id | d0 | d1 |
+-------+--------------+--------------+
| -1234 | -1234.000000 | -1234.000000 |
| 7890 | 7890.000000 | 7890.000000 |
+-------+--------------+--------------+
Notice the change in behaviour with this later version of Impala.
This looks to me like a bug that has been fixed.
Could you please confirm and point me to the IMPALA Jira that explains this bug?
Created 04-21-2020 01:03 PM
We did a wholesale revamp of decimal behaviour going from CDH5 to CDH6. The default behaviour all changed in CDH6.0: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_new_features.ht...
There's a whole epic JIRA capturing the changes: https://issues.apache.org/jira/browse/IMPALA-4072 . I think https://issues.apache.org/jira/browse/IMPALA-4370 might be the specific fix that you're seeing, based on your analysis. The fix version for that change is Impala 2.9.0, so the code change is in CDH5.15.2, but it was done behind the DECIMAL_V2 query option, which wasn't a supported option until CDH6.
IN CDH6 you can toggle the behaviour with the DECIMAL_V2 query option (it will eventually be removed, but was kept for backward compatibility).
Created 04-21-2020 01:03 PM
We did a wholesale revamp of decimal behaviour going from CDH5 to CDH6. The default behaviour all changed in CDH6.0: https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_new_features.ht...
There's a whole epic JIRA capturing the changes: https://issues.apache.org/jira/browse/IMPALA-4072 . I think https://issues.apache.org/jira/browse/IMPALA-4370 might be the specific fix that you're seeing, based on your analysis. The fix version for that change is Impala 2.9.0, so the code change is in CDH5.15.2, but it was done behind the DECIMAL_V2 query option, which wasn't a supported option until CDH6.
IN CDH6 you can toggle the behaviour with the DECIMAL_V2 query option (it will eventually be removed, but was kept for backward compatibility).