<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Choosing RDD Persistence and Caching with Spark on YARN in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96746#M10301</link>
    <description>&lt;P&gt;1) How should we approach the question of persist() or cache() when running Spark on YARN. E.g. how should the Spark developer know approximately how much memory will be available to their YARN Queue and use this number to guide their persist choice()? Or should they use some other technique?&lt;/P&gt;&lt;P&gt;&lt;A href="http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence" target="_blank"&gt;http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2) With Spark on YARN does the "RDD" exist only as long as the SparkDriver lives, as long as the RDD's related spark worker containers live, or based on some other time frame?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Mon, 09 Nov 2015 23:52:57 GMT</pubDate>
    <dc:creator>wfloyd</dc:creator>
    <dc:date>2015-11-09T23:52:57Z</dc:date>
    <item>
      <title>Choosing RDD Persistence and Caching with Spark on YARN</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96746#M10301</link>
      <description>&lt;P&gt;1) How should we approach the question of persist() or cache() when running Spark on YARN. E.g. how should the Spark developer know approximately how much memory will be available to their YARN Queue and use this number to guide their persist choice()? Or should they use some other technique?&lt;/P&gt;&lt;P&gt;&lt;A href="http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence" target="_blank"&gt;http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2) With Spark on YARN does the "RDD" exist only as long as the SparkDriver lives, as long as the RDD's related spark worker containers live, or based on some other time frame?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 09 Nov 2015 23:52:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96746#M10301</guid>
      <dc:creator>wfloyd</dc:creator>
      <dc:date>2015-11-09T23:52:57Z</dc:date>
    </item>
    <item>
      <title>Re: Choosing RDD Persistence and Caching with Spark on YARN</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96747#M10302</link>
      <description>&lt;P&gt;There is also an article on "How to size memory only RDDs" which references setting spark.executor.memory and spark.yarn.executor.memoryOverhead. Should we use these as well in planning memory/RDD usage?&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://www.altiscale.com/blog/tips-and-tricks-for-running-spark-on-hadoop-part-3-rdd-persistence/"&gt;https://www.altiscale.com/blog/tips-and-tricks-for-running-spark-on-hadoop-part-3-rdd-persistence/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Nov 2015 23:54:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96747#M10302</guid>
      <dc:creator>wfloyd</dc:creator>
      <dc:date>2015-11-09T23:54:32Z</dc:date>
    </item>
    <item>
      <title>Re: Choosing RDD Persistence and Caching with Spark on YARN</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96748#M10303</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/528/rsriharsha.html" nodeid="528"&gt;@Ram Sriharsha&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Tue, 10 Nov 2015 03:01:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96748#M10303</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-11-10T03:01:59Z</dc:date>
    </item>
    <item>
      <title>Re: Choosing RDD Persistence and Caching with Spark on YARN</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96749#M10304</link>
      <description>&lt;P&gt;The RDD exists only as long as the spark driver lives. if one or more of the spark worker containers die the portions of the RDDs will be recomputed and cached.&lt;/P&gt;&lt;P&gt;persist and cache at the RDD level are actually the same.&lt;/P&gt;&lt;P&gt;persist has more options though: the default behavior of persist is StorageLevel.MEMORY_ONLY&lt;/P&gt;&lt;P&gt;but you can persist at various different storage levels.&lt;/P&gt;&lt;P&gt;&lt;A href="http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence" target="_blank"&gt;http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Nov 2015 03:05:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96749#M10304</guid>
      <dc:creator>rsriharsha</dc:creator>
      <dc:date>2015-11-10T03:05:52Z</dc:date>
    </item>
    <item>
      <title>Re: Choosing RDD Persistence and Caching with Spark on YARN</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96750#M10305</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/528/rsriharsha.html" nodeid="528"&gt;@Ram Sriharsha&lt;/A&gt; from chimming in . Really appreciate it. &lt;/P&gt;</description>
      <pubDate>Tue, 10 Nov 2015 03:10:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Choosing-RDD-Persistence-and-Caching-with-Spark-on-YARN/m-p/96750#M10305</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-11-10T03:10:32Z</dc:date>
    </item>
  </channel>
</rss>

