Created on 08-30-2024 01:28 AM - edited 08-30-2024 01:30 AM
I am reading oracle table using pySpark. my oracle table contains data type NUMBER and it contains 35 digits long value. when I read this column using spark, it seems spark assumes more precision than original (+10 precision)and end up in throwing following error:
java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38`
please note: I am inferring schema from spark data frame as I don't want to explicitly pass schema.
code:
df_rdbms_data = spark.read.jdbc(url=source_jdbc_url, table=source_table_name, properties=source_properties) df_rdbms_data.printSchema() df_rdbms_data.show()
Is there any way to resolve this issue apart from not passing explicit schema or explicit casting?
Thanks in advance.
Created 09-10-2024 07:50 AM
The error you are encountering, java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values.
To explain the issue further:
According to the documentation, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision).
To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle:
SELECT MAX(LENGTH(large_number)) FROM example_table
If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type:
SELECT TO_CHAR(large_number) AS large_number FROM example_table.
Spark Schema :
>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>> df.printSchema()
root
|-- LARGE_NUMBER: string (nullable = true)
>>>
>>>
>>>
>>>
>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>>
>>>
>>>
>>> df.printSchema()
root
|-- LARGE_NUMBER: decimal(35,5) (nullable = true)
>>>
Created 09-10-2024 07:50 AM
The error you are encountering, java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values.
To explain the issue further:
According to the documentation, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision).
To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle:
SELECT MAX(LENGTH(large_number)) FROM example_table
If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type:
SELECT TO_CHAR(large_number) AS large_number FROM example_table.
Spark Schema :
>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>> df.printSchema()
root
|-- LARGE_NUMBER: string (nullable = true)
>>>
>>>
>>>
>>>
>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>>
>>>
>>>
>>> df.printSchema()
root
|-- LARGE_NUMBER: decimal(35,5) (nullable = true)
>>>
Created 09-11-2024 08:47 AM
@ggangadharan thanks for your reply.
Yes, as soon spark sees NUMBER data type in oralce it convert the df datatype to decimal(38,10) then when precision value in oracle column contains >30 spark cant accommodate it as it only allows 28 max digits if decimal(38,10) hence getting this issue. yeah as you said the probable solution is to cast it as string Type.