- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive: cast String to Float alters decimal part
- Labels:
-
Apache Hive
-
Apache Impala
Created on 11-05-2023 03:51 AM - edited 11-05-2023 03:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a column where float values were stored as STRING type, e.g. 9989.64
When I cast that column to obtain a FLOAT like so "SELECT a, cast(a AS FLOAT) ... " I obtain the following:
a | cast(a AS FLOAT) |
5724.95 | 5724.9501953125 |
Why does Hive alter the decimal part? How can I get the exact number?
BTW, casting the STRING to DOUBLE does render the expected behaviour
BR
Created 11-06-2023 03:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The behavior you're observing is related to the precision differences between STRING and FLOAT data types. When you cast a STRING to a FLOAT, Hive attempts to interpret and represent the value as accurately as possible within the constraints of a FLOAT data type.
FLOATs are limited in precision, and the fractional part might not be represented exactly. In your example, "5724.95" in FLOAT was stored as "5724.9501953125." This discrepancy is due to the way binary floating-point numbers work and how they might not be able to precisely represent certain decimal values.
If you need exact decimal representation, you should consider using a DECIMAL data type instead of FLOAT. DECIMAL provides higher precision and is better suited for scenarios where you need to maintain the exact decimal value without potential loss of precision.
Here's how you can cast your STRING column to DECIMAL to preserve the exact decimal value:
SELECT a, CAST(a AS DECIMAL(20, 10)) AS exact_value FROM your_table;
In this example, DECIMAL(20, 10) indicates a decimal type with a total width of 20 digits and 10 decimal places. This will preserve the exact decimal representation you need.
Keep in mind that DECIMAL has higher storage requirements compared to FLOAT because it maintains precision, so choose the appropriate data type based on your requirements.
Example :
0: jdbc:hive2://nightly-71x-zg-2.nightly-71x-> SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value ;
INFO : Compiling command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf): SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:decimal_value, type:decimal(20,10), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf); Time taken: 0.062 seconds
INFO : Executing command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf): SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value
INFO : Completed executing command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf); Time taken: 0.006 seconds
INFO : OK
+------------------+
| decimal_value |
+------------------+
| 5724.9501953125 |
+------------------+
Created 11-06-2023 02:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ess, Welcome to our community! To help you get the best possible answer, I have tagged in our Hive experts @asish @ggangadharan @tjangid who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 11-06-2023 03:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The behavior you're observing is related to the precision differences between STRING and FLOAT data types. When you cast a STRING to a FLOAT, Hive attempts to interpret and represent the value as accurately as possible within the constraints of a FLOAT data type.
FLOATs are limited in precision, and the fractional part might not be represented exactly. In your example, "5724.95" in FLOAT was stored as "5724.9501953125." This discrepancy is due to the way binary floating-point numbers work and how they might not be able to precisely represent certain decimal values.
If you need exact decimal representation, you should consider using a DECIMAL data type instead of FLOAT. DECIMAL provides higher precision and is better suited for scenarios where you need to maintain the exact decimal value without potential loss of precision.
Here's how you can cast your STRING column to DECIMAL to preserve the exact decimal value:
SELECT a, CAST(a AS DECIMAL(20, 10)) AS exact_value FROM your_table;
In this example, DECIMAL(20, 10) indicates a decimal type with a total width of 20 digits and 10 decimal places. This will preserve the exact decimal representation you need.
Keep in mind that DECIMAL has higher storage requirements compared to FLOAT because it maintains precision, so choose the appropriate data type based on your requirements.
Example :
0: jdbc:hive2://nightly-71x-zg-2.nightly-71x-> SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value ;
INFO : Compiling command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf): SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:decimal_value, type:decimal(20,10), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf); Time taken: 0.062 seconds
INFO : Executing command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf): SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value
INFO : Completed executing command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf); Time taken: 0.006 seconds
INFO : OK
+------------------+
| decimal_value |
+------------------+
| 5724.9501953125 |
+------------------+
Created 11-08-2023 01:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To add to the point of @ggangadharan, there are lots of good articles/posts why the float and even the double datatype has these problems. Note that this is not Hive / Hadoop or Java specific.
https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency
https://dzone.com/articles/never-use-float-and-double-for-monetary-calculatio
Miklos