Solved: Hive: cast String to Float alters decimal part - Cloudera Community - 378582

Support Questions

Find answers, ask questions, and share your expertise

Hive: cast String to Float alters decimal part

avatar
New Contributor

Hi,

I have a column where float values were stored as STRING type, e.g. 9989.64

When I cast that column to obtain a FLOAT like so "SELECT a, cast(a AS FLOAT) ... " I obtain the following:

acast(a AS FLOAT)
5724.95 5724.9501953125

  Why does Hive alter the decimal part? How can I get the exact number?

BTW, casting the STRING to DOUBLE does render the expected behaviour

BR

1 ACCEPTED SOLUTION

avatar
Master Collaborator

The behavior you're observing is related to the precision differences between STRING and FLOAT data types. When you cast a STRING to a FLOAT, Hive attempts to interpret and represent the value as accurately as possible within the constraints of a FLOAT data type.

FLOATs are limited in precision, and the fractional part might not be represented exactly. In your example, "5724.95" in FLOAT was stored as "5724.9501953125." This discrepancy is due to the way binary floating-point numbers work and how they might not be able to precisely represent certain decimal values.

If you need exact decimal representation, you should consider using a DECIMAL data type instead of FLOAT. DECIMAL provides higher precision and is better suited for scenarios where you need to maintain the exact decimal value without potential loss of precision.

Here's how you can cast your STRING column to DECIMAL to preserve the exact decimal value:

 

SELECT a, CAST(a AS DECIMAL(20, 10)) AS exact_value FROM your_table;

 

 

In this example, DECIMAL(20, 10) indicates a decimal type with a total width of 20 digits and 10 decimal places. This will preserve the exact decimal representation you need.

Keep in mind that DECIMAL has higher storage requirements compared to FLOAT because it maintains precision, so choose the appropriate data type based on your requirements.

Example :

 

0: jdbc:hive2://nightly-71x-zg-2.nightly-71x-> SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value ;
INFO  : Compiling command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf): SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:decimal_value, type:decimal(20,10), comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf); Time taken: 0.062 seconds
INFO  : Executing command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf): SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value
INFO  : Completed executing command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf); Time taken: 0.006 seconds
INFO  : OK
+------------------+
|  decimal_value   |
+------------------+
| 5724.9501953125  |
+------------------+

 

View solution in original post

3 REPLIES 3

avatar
Community Manager

@ess, Welcome to our community! To help you get the best possible answer, I have tagged in our Hive experts @asish @ggangadharan @tjangid  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Collaborator

The behavior you're observing is related to the precision differences between STRING and FLOAT data types. When you cast a STRING to a FLOAT, Hive attempts to interpret and represent the value as accurately as possible within the constraints of a FLOAT data type.

FLOATs are limited in precision, and the fractional part might not be represented exactly. In your example, "5724.95" in FLOAT was stored as "5724.9501953125." This discrepancy is due to the way binary floating-point numbers work and how they might not be able to precisely represent certain decimal values.

If you need exact decimal representation, you should consider using a DECIMAL data type instead of FLOAT. DECIMAL provides higher precision and is better suited for scenarios where you need to maintain the exact decimal value without potential loss of precision.

Here's how you can cast your STRING column to DECIMAL to preserve the exact decimal value:

 

SELECT a, CAST(a AS DECIMAL(20, 10)) AS exact_value FROM your_table;

 

 

In this example, DECIMAL(20, 10) indicates a decimal type with a total width of 20 digits and 10 decimal places. This will preserve the exact decimal representation you need.

Keep in mind that DECIMAL has higher storage requirements compared to FLOAT because it maintains precision, so choose the appropriate data type based on your requirements.

Example :

 

0: jdbc:hive2://nightly-71x-zg-2.nightly-71x-> SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value ;
INFO  : Compiling command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf): SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:decimal_value, type:decimal(20,10), comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf); Time taken: 0.062 seconds
INFO  : Executing command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf): SELECT CAST('5724.9501953125' AS DECIMAL(20, 10)) AS decimal_value
INFO  : Completed executing command(queryId=hive_20231106110627_aaa98b66-4db6-4307-9be3-598018c13fbf); Time taken: 0.006 seconds
INFO  : OK
+------------------+
|  decimal_value   |
+------------------+
| 5724.9501953125  |
+------------------+

 

avatar

To add to the point of @ggangadharan, there are lots of good articles/posts why the float and even the double datatype has these problems. Note that this is not Hive / Hadoop or Java specific.

https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency

https://dzone.com/articles/never-use-float-and-double-for-monetary-calculatio 

https://www.red-gate.com/hub/product-learning/sql-prompt/the-dangers-of-using-float-or-real-datatype... 

 Miklos