question Re: Error in Spark Application - Missing an output location for shuffle 2 in Support Questions

question Re: Error in Spark Application - Missing an output location for shuffle 2 in Support Questions https://community.cloudera.com/t5/Support-Questions/Error-in-Spark-Application-Missing-an-output-location-for/m-p/200645#M162665 hi <A rel="user" href="https://community.cloudera.com/users/15105/rahgulati.html" nodeid="15105">@rahul gulati</A>,Apparently, number of partitions for your DataFrame / RDD is creating the issue.This can be controlled by adjusting the spark.default.parallelism parameter in spark context or by using .repartition(<desired number>)When you run in spark-shell please check the mode and number of cores allocated for the execution and adjust the value to which ever is working for the shell modeAlternatively you can observe the same form Spark UI and come to a conclusion on partitions. # from spark website on spark.default.parallelismFor distributed shuffle operations like reduceByKey and join, the largest number of partitions in a parent RDD. For operations like parallelize with no parent RDDs, it depends on the cluster manager:<UL> <LI>Local mode: number of cores on the local machine</LI> <LI>Others: total number of cores on all executor nodes or 2, whichever is larger</LI></UL> Wed, 14 Jun 2017 14:08:18 GMT bkosaraju 2017-06-14T14:08:18Z