Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark error - Decimal precision exceeds max precision 38

avatar
Contributor

I am reading oracle table using pySpark. my oracle table contains data type NUMBER and it contains 35 digits long value. when I read this column using spark, it seems spark assumes more precision than original (+10 precision)and end up in throwing following error:

java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38`

please note: I am inferring schema from spark data frame as I don't want to explicitly pass schema.

code:

df_rdbms_data = spark.read.jdbc(url=source_jdbc_url, table=source_table_name, properties=source_properties)
df_rdbms_data.printSchema()
df_rdbms_data.show()

Is there any way to resolve this issue apart from not passing explicit schema or explicit casting?

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

The error you are encountering, java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values.

To explain the issue further:

  • Oracle's NUMBER data type is highly flexible and can store values with a very large precision.
  • However, Spark's Decimal type has a maximum precision of 38, which limits the number of digits it can accurately represent.

 

According to the documentation, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision).

To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle:

SELECT MAX(LENGTH(large_number)) FROM example_table



If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type:

SELECT TO_CHAR(large_number) AS large_number FROM example_table.

 


Spark Schema :

>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>> df.printSchema()
root
 |-- LARGE_NUMBER: string (nullable = true)

>>>
>>>
>>>
>>>
>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>>
>>>
>>>
>>> df.printSchema()
root
 |-- LARGE_NUMBER: decimal(35,5) (nullable = true)

>>>




View solution in original post

2 REPLIES 2

avatar
Super Collaborator

The error you are encountering, java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values.

To explain the issue further:

  • Oracle's NUMBER data type is highly flexible and can store values with a very large precision.
  • However, Spark's Decimal type has a maximum precision of 38, which limits the number of digits it can accurately represent.

 

According to the documentation, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision).

To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle:

SELECT MAX(LENGTH(large_number)) FROM example_table



If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type:

SELECT TO_CHAR(large_number) AS large_number FROM example_table.

 


Spark Schema :

>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>> df.printSchema()
root
 |-- LARGE_NUMBER: string (nullable = true)

>>>
>>>
>>>
>>>
>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>>
>>>
>>>
>>> df.printSchema()
root
 |-- LARGE_NUMBER: decimal(35,5) (nullable = true)

>>>




avatar
Contributor

@ggangadharan  thanks for your reply.

 

Yes, as soon spark sees NUMBER data type in oralce it convert the df datatype to decimal(38,10) then when precision value in oracle column contains >30 spark cant accommodate it as it only allows 28 max digits if decimal(38,10) hence getting this issue.  yeah as you said the probable solution is to cast it as string Type.