<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Error in Spark Application - Missing an output location for shuffle 2 in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Error-in-Spark-Application-Missing-an-output-location-for/m-p/200644#M162664</link>
    <description>&lt;P&gt;I am trying to run a spark application which is reading data from hive tables into dataframes and joining them. When i try to run the dataframe individually in spark shell then all joins works fine and i am able to persist data in ORC format in HDFS.&lt;/P&gt;&lt;P&gt;But when i run it as an application using spark submit i am getting below mentioned error.&lt;/P&gt;&lt;P&gt;Missing an output location for shuffle 2&lt;/P&gt;&lt;P&gt;I did a research on this and found this to be related to Memory issue. I am not getting that why this error is not coming in spark shell even with the same configuration and i am able to persist everything.&lt;/P&gt;&lt;P&gt;Command i am using to run application is mentioned below&lt;/P&gt;&lt;P&gt;spark-submit --master yarn-client --driver-memory 10g --num-executors 3 --executor-memory 10g --executor-cores 2 --class main.scala.test.Cences --files /usr/hdp/current/spark-client/conf/hive-site.xml --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar /home/talend/test_2.11-0.0.1.jar&lt;/P&gt;&lt;P&gt;My cluster configuration is &lt;/P&gt;&lt;P&gt;2 Master Nodes, 3 slave nodes(4 cores and 28 GB each) and 1 Edge Node.&lt;/P&gt;&lt;P&gt;Hive tables from which i am reading data are of around 150 MB (very less) in size which is very less as compared to the memory i am giving to spark programs.&lt;/P&gt;&lt;P&gt;I am calling following dataframes functions i.e. saveAsTable(), write.format(), persist() in between in application.&lt;/P&gt;&lt;P&gt;Any suggestions would really be helpful?&lt;/P&gt;</description>
    <pubDate>Wed, 07 Jun 2017 17:28:44 GMT</pubDate>
    <dc:creator>munnyrahul</dc:creator>
    <dc:date>2017-06-07T17:28:44Z</dc:date>
  </channel>
</rss>

