question Re: Spark error - Decimal precision exceeds max precision 38 in Support Questions

Spark error - Decimal precision exceeds max precision 38

bigdatacm — Fri, 30 Aug 2024 08:30:14 GMT

I am reading oracle table using pySpark. my oracle table contains data type NUMBER and it contains 35 digits long value. when I read this column using spark, it seems spark assumes more precision than original (+10 precision)and end up in throwing following error:

java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38`

please note: I am inferring schema from spark data frame as I don't want to explicitly pass schema.

code:

df_rdbms_data = spark.read.jdbc(url=source_jdbc_url, table=source_table_name, properties=source_properties)
df_rdbms_data.printSchema()
df_rdbms_data.show()

Is there any way to resolve this issue apart from not passing explicit schema or explicit casting?

Thanks in advance.

Re: Spark error - Decimal precision exceeds max precision 38

ggangadharan — Tue, 10 Sep 2024 14:50:52 GMT

The error you are encountering, java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values.

To explain the issue further:

Oracle's NUMBER data type is highly flexible and can store values with a very large precision.
However, Spark's Decimal type has a maximum precision of 38, which limits the number of digits it can accurately represent.

According to the documentation, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision).

To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle:

SELECT MAX(LENGTH(large_number)) FROM example_table

If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type:

SELECT TO_CHAR(large_number) AS large_number FROM example_table.

Spark Schema :

>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load() >>> df.printSchema() root |-- LARGE_NUMBER: string (nullable = true) >>> >>> >>> >>> >>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load() >>> >>> >>> >>> df.printSchema() root |-- LARGE_NUMBER: decimal(35,5) (nullable = true) >>>

Re: Spark error - Decimal precision exceeds max precision 38

bigdatacm — Wed, 11 Sep 2024 15:47:40 GMT

@ggangadharan thanks for your reply.

Yes, as soon spark sees NUMBER data type in oralce it convert the df datatype to decimal(38,10) then when precision value in oracle column contains >30 spark cant accommodate it as it only allows 28 max digits if decimal(38,10) hence getting this issue. yeah as you said the probable solution is to cast it as string Type.