<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Error in Spark Application - Missing an output location for shuffle 2 in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Error-in-Spark-Application-Missing-an-output-location-for/m-p/200645#M162665</link>
    <description>&lt;P&gt;hi &lt;A rel="user" href="https://community.cloudera.com/users/15105/rahgulati.html" nodeid="15105"&gt;@rahul gulati&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Apparently, number of
partitions for your DataFrame / RDD is creating the issue.&lt;/P&gt;&lt;P&gt;This can be controlled by adjusting
the spark.default.parallelism parameter in spark context or by using
.repartition(&amp;lt;desired number&amp;gt;)&lt;/P&gt;&lt;P&gt;When you run in spark-shell
please check the mode and number of cores allocated for the execution and
adjust the value to which ever is working for the shell mode&lt;/P&gt;&lt;P&gt;Alternatively you can observe
the same form Spark UI and come to a conclusion on partitions. &lt;/P&gt;&lt;P&gt;# from spark website on spark.default.parallelism&lt;/P&gt;&lt;P&gt;For distributed shuffle
operations like reduceByKey and join, the largest number of
partitions in a parent RDD. For operations like parallelize with no
parent RDDs, &lt;/P&gt;&lt;P&gt;it
depends on the cluster manager:&lt;/P&gt;&lt;UL&gt;
 
&lt;LI&gt;Local mode: number of cores on the local machine&lt;/LI&gt; 
&lt;LI&gt;Others: total number of cores on all executor
     nodes or 2, whichever is larger&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Wed, 14 Jun 2017 14:08:18 GMT</pubDate>
    <dc:creator>bkosaraju</dc:creator>
    <dc:date>2017-06-14T14:08:18Z</dc:date>
  </channel>
</rss>

