<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How does spark runtime jar (../spark-2.0.1/jars) get distributed to Physical Worker Node in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-spark-runtime-jar-spark-2-0-1-jars-get-distributed/m-p/48451#M48288</link>
    <description>&lt;P&gt;&lt;SPAN&gt;When running on Spark, &lt;/SPAN&gt;&lt;SPAN&gt;the spark archive gets distributed&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;to &lt;/SPAN&gt;&lt;SPAN&gt;worker&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;nodes via the ContainerLocalizer (aka distributed cache). &amp;nbsp;Spark first uploads files to HDFS and then worker nodes can handle downloading the jar when needed from HDFS. &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;The localizer has some checks &lt;/SPAN&gt;&lt;SPAN&gt;to only download the jar when it has changed or has been removed from the worker, so it can reuse the jar and not have to download it again if it still exists lo&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 13 Dec 2016 03:48:53 GMT</pubDate>
    <dc:creator>hubbarja</dc:creator>
    <dc:date>2016-12-13T03:48:53Z</dc:date>
    <item>
      <title>How does spark runtime jar (../spark-2.0.1/jars) get distributed to Physical Worker Node</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-spark-runtime-jar-spark-2-0-1-jars-get-distributed/m-p/48384#M48287</link>
      <description>&lt;P&gt;As per my understanding spark does not need to be installed on all the node in a yarn cluster. Spark installation is only required at the node(usually gateway node) from where spark-submit script is fired.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As per spark programming guide&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;To make Spark runtime jars accessible from YARN side, you can specify spark.yarn.archive or spark.yarn.jars.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;How does libraries containing Spark code (i.e spark runtime jar available in ../spark-2.0.1-bin-hadoop2.6/jars) get distributed to Physical Worker Node(where executor are launched) in a YARN cluster.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does this libraries gets copied to worker node every time we run a spark Application.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank You.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:50:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-spark-runtime-jar-spark-2-0-1-jars-get-distributed/m-p/48384#M48287</guid>
      <dc:creator>dkdeepak</dc:creator>
      <dc:date>2022-09-16T10:50:25Z</dc:date>
    </item>
    <item>
      <title>Re: How does spark runtime jar (../spark-2.0.1/jars) get distributed to Physical Worker Node</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-spark-runtime-jar-spark-2-0-1-jars-get-distributed/m-p/48451#M48288</link>
      <description>&lt;P&gt;&lt;SPAN&gt;When running on Spark, &lt;/SPAN&gt;&lt;SPAN&gt;the spark archive gets distributed&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;to &lt;/SPAN&gt;&lt;SPAN&gt;worker&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;nodes via the ContainerLocalizer (aka distributed cache). &amp;nbsp;Spark first uploads files to HDFS and then worker nodes can handle downloading the jar when needed from HDFS. &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;The localizer has some checks &lt;/SPAN&gt;&lt;SPAN&gt;to only download the jar when it has changed or has been removed from the worker, so it can reuse the jar and not have to download it again if it still exists lo&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2016 03:48:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-spark-runtime-jar-spark-2-0-1-jars-get-distributed/m-p/48451#M48288</guid>
      <dc:creator>hubbarja</dc:creator>
      <dc:date>2016-12-13T03:48:53Z</dc:date>
    </item>
    <item>
      <title>Re: How does spark runtime jar (../spark-2.0.1/jars) get distributed to Physical Worker Node</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-spark-runtime-jar-spark-2-0-1-jars-get-distributed/m-p/48520#M48289</link>
      <description>&lt;P&gt;As you mentioned "&lt;SPAN&gt;The localizer has some checks &lt;/SPAN&gt;&lt;SPAN&gt;to only download the jar when it has changed or has been removed from the worker&lt;/SPAN&gt;".&lt;/P&gt;&lt;P&gt;Is there any similar checks happens while copying spark jar from&amp;nbsp;node (from where spark&amp;nbsp;application is launched) to HDFS.&amp;nbsp;Mean to say for multiple Spark Application launch spark jar will be copied to HDFS only once.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank You.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Dec 2016 14:40:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-spark-runtime-jar-spark-2-0-1-jars-get-distributed/m-p/48520#M48289</guid>
      <dc:creator>dkdeepak</dc:creator>
      <dc:date>2016-12-14T14:40:51Z</dc:date>
    </item>
    <item>
      <title>Re: How does spark runtime jar (../spark-2.0.1/jars) get distributed to Physical Worker Node</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-spark-runtime-jar-spark-2-0-1-jars-get-distributed/m-p/48556#M48290</link>
      <description>&lt;P&gt;When spark determines it needs to use yarn's localizer, it will always load the jar&amp;nbsp;to HDFS, it does not attempt to check if the file changed before loading.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When using the Spark distributed included with CDH, the spark jar is already loaded to all nodes and specifies the jar is local. &amp;nbsp;When specifying it is local, spark will not upload the jar and yarn's localizer is not used.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Dec 2016 03:10:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-spark-runtime-jar-spark-2-0-1-jars-get-distributed/m-p/48556#M48290</guid>
      <dc:creator>hubbarja</dc:creator>
      <dc:date>2016-12-15T03:10:08Z</dc:date>
    </item>
  </channel>
</rss>

