<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark SQL Job stcuk  indefinitely at last task of a stage -- Shows  INFO: BlockManagerInfo : Removed broadcast in memory in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-SQL-Job-stcuk-indefinitely-at-last-task-of-a-stage/m-p/132539#M95209</link>
    <description>&lt;P&gt;Thank Puneet  for reply..here is my command &amp;amp; other information&lt;/P&gt;&lt;P&gt;spark-submit --master yarn-client --driver-memory 15g --num-executors 25 --total-executor-cores 60  --executor-memory 15g  --driver-cores 2 --conf "spark.executor.memory=-XX:+UseG1GC -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -Xms10g -Xmx10g -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThread=20" --class logicdriver logic.jar&lt;/P&gt;&lt;P&gt;configuration .....&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"SET hive.execution.engine=tez"&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"SET hive.optimize.tez=true"&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"set hive.vectorized.execution.enabled = true "&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"set hive.vectorized.execution.reduce.enabled = true "&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"set spark.sql.shuffle.partitions=2050"&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"SET spark.sql.hive.metastore.version=0.14.0.2.2.4.10-1"&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"SET hive.warehouse.data.skipTrash=true "&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"SET hive.exec.dynamic.partition = true "&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"SET hive.exec.dynamic.partition.mode=nonstrict "&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"SET spark.driver.maxResultSize= 8192"&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"SET spark.default.parallelism = 350"&lt;/STRONG&gt;);

ContextService.&lt;EM&gt;getHiveContext&lt;/EM&gt;.sql(&lt;STRONG&gt;"SET spark.yarn.executor.memoryOverhead=1024&lt;/STRONG&gt;&lt;STRONG&gt;"&lt;/STRONG&gt;);&lt;/P&gt;&lt;P&gt;data---------&lt;/P&gt;&lt;P&gt;It reads data from from 2 tables and perform join and put result in Dataframes...then again read new tables and does join on previous Dataframe...this cycle goes for 7-8 times and finally it insert result in hive. &lt;/P&gt;&lt;P&gt;first table has - 63245969 records&lt;/P&gt;&lt;P&gt;2nd table has  - 49275922 records....all the tables have records in this range.&lt;/P&gt;&lt;P&gt;  &lt;/P&gt;</description>
    <pubDate>Mon, 18 Jul 2016 12:37:07 GMT</pubDate>
    <dc:creator>pkhare</dc:creator>
    <dc:date>2016-07-18T12:37:07Z</dc:date>
  </channel>
</rss>

