<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark error - Decimal precision exceeds max precision 38 in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-error-Decimal-precision-exceeds-max-precision-38/m-p/393300#M248425</link>
    <description>&lt;P&gt;&lt;SPAN&gt;The error you are encountering, &lt;STRONG&gt;java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38&lt;/STRONG&gt;" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values.&lt;BR /&gt;&lt;BR /&gt;To explain the issue further:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Oracle's NUMBER data type is highly flexible and can store values with a very large precision.&lt;/LI&gt;&lt;LI&gt;However, Spark's Decimal type has a maximum precision of 38, which limits the number of digits it can accurately represent.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;According to the &lt;A href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DecimalType.html" target="_self"&gt;documentation&lt;/A&gt;, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision).&lt;BR /&gt;&lt;BR /&gt;To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SELECT MAX(LENGTH(large_number)) FROM example_table&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SELECT TO_CHAR(large_number) AS large_number FROM example_table.&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;Spark Schema :&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;gt;&amp;gt;&amp;gt; df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
&amp;gt;&amp;gt;&amp;gt; df.printSchema()
root
 |-- LARGE_NUMBER: string (nullable = true)

&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; df.printSchema()
root
 |-- LARGE_NUMBER: decimal(35,5) (nullable = true)

&amp;gt;&amp;gt;&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 10 Sep 2024 14:50:52 GMT</pubDate>
    <dc:creator>ggangadharan</dc:creator>
    <dc:date>2024-09-10T14:50:52Z</dc:date>
    <item>
      <title>Spark error - Decimal precision exceeds max precision 38</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-error-Decimal-precision-exceeds-max-precision-38/m-p/392840#M248269</link>
      <description>&lt;P&gt;I am reading oracle table using pySpark. my oracle table contains data type NUMBER and it contains 35 digits long value. when I read this column using spark, it seems spark assumes more precision than original (+10 precision)and end up in throwing following error:&lt;/P&gt;&lt;PRE&gt;java.lang.ArithmeticException: Decimal precision &lt;SPAN class="hljs-number"&gt;45&lt;/SPAN&gt; exceeds &lt;SPAN class="hljs-built_in"&gt;max&lt;/SPAN&gt; precision &lt;SPAN class="hljs-number"&gt;38&lt;/SPAN&gt;`&lt;/PRE&gt;&lt;P&gt;please note: I am inferring schema from spark data frame as I don't want to explicitly pass schema.&lt;/P&gt;&lt;P&gt;code:&lt;/P&gt;&lt;PRE&gt;df_rdbms_data = spark.read.jdbc(url=source_jdbc_url, table=source_table_name, properties=source_properties)
df_rdbms_data.printSchema()
df_rdbms_data.show()&lt;/PRE&gt;&lt;P&gt;Is there any way to resolve this issue apart from not passing explicit schema or explicit casting?&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2024 08:30:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-error-Decimal-precision-exceeds-max-precision-38/m-p/392840#M248269</guid>
      <dc:creator>bigdatacm</dc:creator>
      <dc:date>2024-08-30T08:30:14Z</dc:date>
    </item>
    <item>
      <title>Re: Spark error - Decimal precision exceeds max precision 38</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-error-Decimal-precision-exceeds-max-precision-38/m-p/393300#M248425</link>
      <description>&lt;P&gt;&lt;SPAN&gt;The error you are encountering, &lt;STRONG&gt;java.lang.ArithmeticException: Decimal precision 45 exceeds max precision 38&lt;/STRONG&gt;" occurs because Spark automatically infers the schema for the Oracle NUMBER type. When the data has a very large precision, such as 35 digits in your case, Spark may overestimate the precision due to how it handles floating-point and decimal values.&lt;BR /&gt;&lt;BR /&gt;To explain the issue further:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Oracle's NUMBER data type is highly flexible and can store values with a very large precision.&lt;/LI&gt;&lt;LI&gt;However, Spark's Decimal type has a maximum precision of 38, which limits the number of digits it can accurately represent.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;According to the &lt;A href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DecimalType.html" target="_self"&gt;documentation&lt;/A&gt;, Spark's decimal data type can have a precision of up to 38, and the scale can also be up to 38 (but must be less than or equal to the precision).&lt;BR /&gt;&lt;BR /&gt;To resolve this issue, you should ensure that your Oracle database does not have values larger than the maximum precision and scale allowed by Spark. You can verify this by running the following query in Oracle:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SELECT MAX(LENGTH(large_number)) FROM example_table&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;If the result is greater than 38, you can try using the following query to read the data as a string instead of a decimal data type:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;SELECT TO_CHAR(large_number) AS large_number FROM example_table.&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;Spark Schema :&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;gt;&amp;gt;&amp;gt; df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT TO_CHAR(large_number) as large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
&amp;gt;&amp;gt;&amp;gt; df.printSchema()
root
 |-- LARGE_NUMBER: string (nullable = true)

&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; df=spark.read.format("jdbc").option("url", oracle_url).option("query", "SELECT large_number FROM example_table_with_decimal").option("user", "user1").option("password", "password").option("driver", "oracle.jdbc.driver.OracleDriver").load()
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt;
&amp;gt;&amp;gt;&amp;gt; df.printSchema()
root
 |-- LARGE_NUMBER: decimal(35,5) (nullable = true)

&amp;gt;&amp;gt;&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Sep 2024 14:50:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-error-Decimal-precision-exceeds-max-precision-38/m-p/393300#M248425</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2024-09-10T14:50:52Z</dc:date>
    </item>
    <item>
      <title>Re: Spark error - Decimal precision exceeds max precision 38</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-error-Decimal-precision-exceeds-max-precision-38/m-p/393383#M248436</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92016"&gt;@ggangadharan&lt;/a&gt;&amp;nbsp; thanks for your reply.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, as soon spark sees NUMBER data type in oralce it convert the df datatype to decimal(38,10) then when precision value in oracle column contains &amp;gt;30 spark cant accommodate it as it only allows 28 max digits if decimal(38,10) hence getting this issue.&amp;nbsp; yeah as you said the probable solution is to cast it as string Type.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Sep 2024 15:47:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-error-Decimal-precision-exceeds-max-precision-38/m-p/393383#M248436</guid>
      <dc:creator>bigdatacm</dc:creator>
      <dc:date>2024-09-11T15:47:40Z</dc:date>
    </item>
  </channel>
</rss>

