<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: spark job shuffle write super slow in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/spark-job-shuffle-write-super-slow/m-p/220401#M182286</link>
    <description>&lt;P&gt;Hey &lt;A rel="user" href="https://community.cloudera.com/users/11642/pradeepbill.html" nodeid="11642"&gt;@pradeep arumalla&lt;/A&gt;!&lt;BR /&gt;I'm not a specialist in coding or spark, but did you tried to change your &lt;STRONG&gt;groupByKey&lt;/STRONG&gt; for &lt;STRONG&gt;reduceByKey&lt;/STRONG&gt; (at lhe last line)?&lt;/P&gt;&lt;P&gt;And about the executors --num-executors, how are you launching your job, is it by spark-submit? Could you share with us?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;BTW: here's some links about shuffling &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;BR /&gt;&lt;A href="https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-shuffle.html" target="_blank"&gt;https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-shuffle.html&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://0x0fff.com/spark-architecture-shuffle/" target="_blank"&gt;https://0x0fff.com/spark-architecture-shuffle/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
    <pubDate>Tue, 12 Jun 2018 22:14:22 GMT</pubDate>
    <dc:creator>vmurakami</dc:creator>
    <dc:date>2018-06-12T22:14:22Z</dc:date>
  </channel>
</rss>

