<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Broadcast error in spark 3 in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Broadcast-error-in-spark-3/m-p/404934#M252398</link>
    <description>&lt;P&gt;1. Check if you have any Hints (Broadcast) set at the query level .&amp;nbsp;&lt;BR /&gt;2. try increasing &lt;SPAN&gt;spark.sql.shuffle.partitions&lt;BR /&gt;3. you can set SQL HINTS such as MERGE to use sort merge join , instead of broadcast*&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 27 Mar 2025 13:33:03 GMT</pubDate>
    <dc:creator>haridjh</dc:creator>
    <dc:date>2025-03-27T13:33:03Z</dc:date>
    <item>
      <title>Broadcast error in spark 3</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Broadcast-error-in-spark-3/m-p/404896#M252388</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I have been using spark 2.2 for long time in CDSW and recently trying to work in spark 3 in CDP. One of my queries is failing in spark 3 with an error of following&amp;nbsp;&lt;/P&gt;&lt;P&gt;Py4JJavaError: An error occurred while calling o96.sql.&lt;/P&gt;&lt;P&gt;: org.apache.spark.SparkException: Cannot broadcast the table over 512000000 rows: 1235668051 rows&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotBroadcastTableOverMaxTableRowsError(QueryExecutionErrors.scala:1824)&lt;/P&gt;&lt;P&gt;Even though this same query runs fine in spark 2.2 in CDSW. My spark session configuration is following&amp;nbsp;&lt;/P&gt;&lt;P&gt;# SET GENERAL SPARK PROPERTIES #&lt;BR /&gt;print(" Configuring General Spark Properties")&lt;BR /&gt;spark_session_builder = spark_session_builder.appName(name="Wrangler-Routine")&lt;BR /&gt;spark_session_builder = spark_session_builder.master(master="yarn")&lt;BR /&gt;spark_session_builder = spark_session_builder.enableHiveSupport()&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.yarn.queue", "root.project")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.kryoserializer.buffer", "128m")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.kryoserializer.buffer.max", "2024m")&lt;BR /&gt;# SET SPARK DRIVER PROPERTIES #&lt;BR /&gt;print(" Configuring Spark Driver Properties")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.driver.cores", "16")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.driver.memory", "64g")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.driver.memoryOverhead", "8g")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.driver.maxResultSize", "16g")&lt;BR /&gt;# SET SPARK EXECUTOR PROPERTIES #&lt;BR /&gt;print(" Configuring Spark Executor Properties")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.executors.instances", "16")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.executor.cores", "8")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.executor.memory", "8g")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.executor.memoryOverhead", "8g")&lt;BR /&gt;# SET SPARK SQL PROPERTIES #&lt;BR /&gt;print(" Configuring Spark SQL Properties")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.sql.crossJoin.enabled", "true")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.sql.autoBroadcastJoinThreshold", "-1")&lt;BR /&gt;spark_session_builder = spark_session_builder.config("spark.sql.adaptive.autoBroadcastJoinThreshold", "-1")&lt;BR /&gt;# INSTANTIATE SPARK SESSION #&lt;BR /&gt;print("Instantiating Spark Session")&lt;BR /&gt;spark_session = spark_session_builder.getOrCreate()&lt;/P&gt;&lt;P&gt;spark_session.sql("""my sql here""")&lt;/P&gt;&lt;P&gt;what am I missing here?!&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 06:20:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Broadcast-error-in-spark-3/m-p/404896#M252388</guid>
      <dc:creator>Mamun_Shaheed</dc:creator>
      <dc:date>2026-04-21T06:20:24Z</dc:date>
    </item>
    <item>
      <title>Re: Broadcast error in spark 3</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Broadcast-error-in-spark-3/m-p/404934#M252398</link>
      <description>&lt;P&gt;1. Check if you have any Hints (Broadcast) set at the query level .&amp;nbsp;&lt;BR /&gt;2. try increasing &lt;SPAN&gt;spark.sql.shuffle.partitions&lt;BR /&gt;3. you can set SQL HINTS such as MERGE to use sort merge join , instead of broadcast*&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Mar 2025 13:33:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Broadcast-error-in-spark-3/m-p/404934#M252398</guid>
      <dc:creator>haridjh</dc:creator>
      <dc:date>2025-03-27T13:33:03Z</dc:date>
    </item>
    <item>
      <title>Re: Broadcast error in spark 3</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Broadcast-error-in-spark-3/m-p/406289#M252526</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/84376"&gt;@Mamun_Shaheed&lt;/a&gt;&amp;nbsp;Did the response help resolve your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Apr 2025 07:45:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Broadcast-error-in-spark-3/m-p/406289#M252526</guid>
      <dc:creator>haridjh</dc:creator>
      <dc:date>2025-04-15T07:45:06Z</dc:date>
    </item>
  </channel>
</rss>

