<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark - YARN Capacity Scheduler in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167966#M130296</link>
    <description>&lt;P&gt;I think its spark.dynamicAllocation.initialExecutors that you can set per job. Try putting in a property file and passing it with --properties-file. Haven't tried this myself, so let me know how it works. &lt;/P&gt;</description>
    <pubDate>Thu, 26 May 2016 04:41:34 GMT</pubDate>
    <dc:creator>ravi1</dc:creator>
    <dc:date>2016-05-26T04:41:34Z</dc:date>
    <item>
      <title>Spark - YARN Capacity Scheduler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167959#M130289</link>
      <description>&lt;P&gt;Can we configure the capacity scheduler in such a way that a spark job only runs when it can procure enough resources?&lt;/P&gt;&lt;P&gt;In the current FIFO setup a spark job will start running if it can get a few of the required executors, but the job will fail because it couldn't get enough resources. &lt;/P&gt;&lt;P&gt;I would like the spark job to only start when it can procure all the required resources. &lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2016 02:09:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167959#M130289</guid>
      <dc:creator>nismaily</dc:creator>
      <dc:date>2016-05-26T02:09:42Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - YARN Capacity Scheduler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167960#M130290</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/5185/nismaily.html" nodeid="5185"&gt;@Nasheb Ismaily&lt;/A&gt;&lt;P&gt; You might need to set the minimum-user-limit-percent (say 30%) &lt;/P&gt;&lt;PRE&gt;yarn.scheduler.capacity.root.support.services.minimum-user- limit-percent&lt;/PRE&gt;&lt;P&gt;Unless the 30% of the queue capacity is available , the job will not start.&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2016 02:25:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167960#M130290</guid>
      <dc:creator>yjagadeesan</dc:creator>
      <dc:date>2016-05-26T02:25:43Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - YARN Capacity Scheduler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167961#M130291</link>
      <description>&lt;P&gt;Thank you ,I'll try this out&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2016 02:27:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167961#M130291</guid>
      <dc:creator>nismaily</dc:creator>
      <dc:date>2016-05-26T02:27:11Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - YARN Capacity Scheduler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167962#M130292</link>
      <description>&lt;P&gt;This will only work at user level, not at job level. So, if the user has other jobs and he gets the % of queue, spark job will start even before it can get that. &lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2016 02:31:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167962#M130292</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-05-26T02:31:18Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - YARN Capacity Scheduler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167963#M130293</link>
      <description>&lt;P&gt;If you are not using dynamic allocation, your job that is submitted will not start until it gets all the resources. You are asking for N number of executors, so YARN will not let you proceed until you get all executors. &lt;/P&gt;&lt;P&gt;If you are using dynamic allocation, then setting spark.dynamicAllocation.minExecutors to a higher value will mean that the job gets scheduled only if minExecutors are met. &lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2016 04:04:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167963#M130293</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-05-26T04:04:03Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - YARN Capacity Scheduler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167964#M130294</link>
      <description>&lt;P&gt;Thanks Ravi, this is very close to what I need. &lt;/P&gt;&lt;P&gt;Question, spark.dynamicAllocation.minExecutors seems to be a global property in spark-defautls&lt;/P&gt;&lt;P&gt;Is there a way to set this property on a job by job basis?&lt;/P&gt;&lt;P&gt;Spark job1 -&amp;gt; min executors 8&lt;/P&gt;&lt;P&gt;Spark job2 -&amp;gt; min executors 5&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2016 04:19:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167964#M130294</guid>
      <dc:creator>nismaily</dc:creator>
      <dc:date>2016-05-26T04:19:04Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - YARN Capacity Scheduler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167965#M130295</link>
      <description>&lt;P&gt;I agree &lt;A rel="user" href="https://community.cloudera.com/users/216/ravi.html" nodeid="216"&gt;@Ravi Mutyala&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2016 04:20:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167965#M130295</guid>
      <dc:creator>yjagadeesan</dc:creator>
      <dc:date>2016-05-26T04:20:15Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - YARN Capacity Scheduler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167966#M130296</link>
      <description>&lt;P&gt;I think its spark.dynamicAllocation.initialExecutors that you can set per job. Try putting in a property file and passing it with --properties-file. Haven't tried this myself, so let me know how it works. &lt;/P&gt;</description>
      <pubDate>Thu, 26 May 2016 04:41:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167966#M130296</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-05-26T04:41:34Z</dc:date>
    </item>
    <item>
      <title>Re: Spark - YARN Capacity Scheduler</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167967#M130297</link>
      <description>&lt;P&gt;Thanks Ravi,&lt;/P&gt;&lt;P&gt;I had to:&lt;/P&gt;&lt;P&gt;1) Copy spark shuffle jars to nodemanager classpaths on all nodes&lt;/P&gt;&lt;P&gt;2) add spark_shuffle to yarn.nodemanager.aux-services, set yarn.nodemanager.aux-services.spark_shuffle.class to org.apache.spark.network.yarn.YarnShuffleService in yarn-site.xml (via Ambari)&lt;/P&gt;&lt;P&gt;3) Restart all nodemanagers&lt;/P&gt;&lt;P&gt;4) Add the following to spark-defaults.conf&lt;/P&gt;&lt;P&gt;spark.dynamicAllocation.enabled true&lt;/P&gt;&lt;P&gt;spark.shuffle.service.enabled   true&lt;/P&gt;&lt;P&gt;5) Set these parameters per job basis&lt;/P&gt;&lt;P&gt;spark.dynamicAllocation.initialExecutors=#&lt;/P&gt;&lt;P&gt;spark.dynamicAllocation.minExecutors=#&lt;/P&gt;</description>
      <pubDate>Wed, 01 Jun 2016 07:48:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-YARN-Capacity-Scheduler/m-p/167967#M130297</guid>
      <dc:creator>nismaily</dc:creator>
      <dc:date>2016-06-01T07:48:58Z</dc:date>
    </item>
  </channel>
</rss>

