Support Questions

bigdatacm · ‎08-30-2024

I am reading oracle table using pySpark. my oracle table contains data type NUMBER and it contains 35 digits long value. when I read this column using spark, it seems spark assumes more precision than original (+10 precision)and end up in throwing following error:

java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38`

please note: I am inferring schema from spark data frame as I don't want to explicitly pass schema.

code:

df_rdbms_data = spark.read.jdbc(url=source_jdbc_url, table=source_table_name, properties=source_properties)
df_rdbms_data.printSchema()
df_rdbms_data.show()

Is there any way to resolve this issue apart from not passing explicit schema or explicit casting?

Thanks in advance.

ggangadharan · ‎09-10-2024

The error you are encountering, java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values.

To explain the issue further:

Oracle's NUMBER data type is highly flexible and can store values with a very large precision.
However, Spark's Decimal type has a maximum precision of 38, which limits the number of digits it can accurately represent.

According to the documentation, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision).

To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle:

SELECT MAX(LENGTH(large_number)) FROM example_table

If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type:

SELECT TO_CHAR(large_number) AS large_number FROM example_table.

Spark Schema :

>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>> df.printSchema()
root
 |-- LARGE_NUMBER: string (nullable = true)

>>>
>>>
>>>
>>>
>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>>
>>>
>>>
>>> df.printSchema()
root
 |-- LARGE_NUMBER: decimal(35,5) (nullable = true)

>>>

View solution in original post

ggangadharan · ‎09-10-2024

The error you are encountering, java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values.

To explain the issue further:

Oracle's NUMBER data type is highly flexible and can store values with a very large precision.
However, Spark's Decimal type has a maximum precision of 38, which limits the number of digits it can accurately represent.

According to the documentation, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision).

To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle:

SELECT MAX(LENGTH(large_number)) FROM example_table

If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type:

SELECT TO_CHAR(large_number) AS large_number FROM example_table.

Spark Schema :

>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>> df.printSchema()
root
 |-- LARGE_NUMBER: string (nullable = true)

>>>
>>>
>>>
>>>
>>> df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
>>>
>>>
>>>
>>> df.printSchema()
root
 |-- LARGE_NUMBER: decimal(35,5) (nullable = true)

>>>

bigdatacm · ‎09-11-2024

@ggangadharan thanks for your reply.

Yes, as soon spark sees NUMBER data type in oralce it convert the df datatype to decimal(38,10) then when precision value in oracle column contains >30 spark cant accommodate it as it only allows 28 max digits if decimal(38,10) hence getting this issue. yeah as you said the probable solution is to cast it as string Type.

Cloudera Community

Support Questions

Spark error - Decimal precision exceeds max precision 38

Precision of DoubleVal calculations in udf

what is a structure data and unstructured data in ...

Max number of databases and tables allowed in Hive...

impala memory limit exceed

Impala Memory limit exceed

Dynamic Initial Max Value on GenerateTableFetch

Spark in CML: Recommendations for using Spark in C...

YARN - VCores max

Error while initiating spark shell

Spark Streaming Graceful Shutdown - Part2