<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark Standalone Cluster. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106344#M38087</link>
    <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/12497/srikanthch45.html"&gt;RAMESH K&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Use Spark Standalone if you are Spark only shop and you don't care about resource contention with other services from the Hadoop ecosystem.   Your Spark uses all resources of your cluster.&lt;/P&gt;&lt;P&gt;If your Spark is part of Hortonworks Data Platform and Spark shares resources like HDFS, use Spark over YARN. That will allow you to allocate proper resources to Spark and avoid resource contention with other services. You can achieve SLA.&lt;/P&gt;&lt;P&gt;I hope this answer helps.&lt;/P&gt;</description>
    <pubDate>Thu, 18 Aug 2016 09:49:19 GMT</pubDate>
    <dc:creator>cstanca</dc:creator>
    <dc:date>2016-08-18T09:49:19Z</dc:date>
    <item>
      <title>Spark Standalone Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106341#M38084</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Can any one please clarify my understanding on the use case difference between 'SparkStandalone' and 'Spark on YARN' cluster.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Spark Standalone Cluster:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;If we do not have huge volume of data to process. If number of nodes required to process data are something less than 10 nodes. Then good to go with Standalone cluster.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Spark on YARN Cluster:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;If you have huge volume of data to process and had to use more number of nodes and hence you need a better cluster manager to manage these nodes. Then good to go with Spark on YARN cluster.&lt;/P&gt;&lt;P&gt;Also can anyone please let me know the infrastructure specifications required for the 'Spark Standalone' cluster.&lt;/P&gt;&lt;P&gt;For example in the case of 'Spark Standalone' if its having 10 Spark node cluster.&lt;/P&gt;&lt;P&gt;Can we just have 1 reliable hardware/machine for cluster manager as a master node and rest of 9 machines as worker nodes as slave nodes?&lt;/P&gt;</description>
      <pubDate>Wed, 17 Aug 2016 10:28:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106341#M38084</guid>
      <dc:creator>srikanth_ch45</dc:creator>
      <dc:date>2016-08-17T10:28:57Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Standalone Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106342#M38085</link>
      <description>&lt;P&gt;Spark Standalone mode is Spark’s own built-in clustered environment. Standalone-Master is the resource manager for the Spark Standalone cluster.Standalone-Worker is the worker in the Spark Standalone cluster.
To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster.You can launch standalone cluster either manually, by starting a master and workers by hand, or use launch scripts.&lt;/P&gt;&lt;P&gt;In most enterprises, you already have Hadoop cluster that is running YARN and want to leverage it for resource management instead of additionally running Spark Standalone mode. If using YARN, spark applications will run its spark-master and spark-workers within containers of YARN.&lt;/P&gt;&lt;P&gt;Irrespective of your deployment mode, Spark application will consume same resources it requires to process the data. In case of YARN you have to be aware of what other workloads will be running on cluster (like MR, Tez etc) at same time spark application is executing and size your machines accordingly.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Aug 2016 11:31:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106342#M38085</guid>
      <dc:creator>rreddy</dc:creator>
      <dc:date>2016-08-17T11:31:29Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Standalone Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106343#M38086</link>
      <description>&lt;P&gt;@ Rahul am asking about the use case difference. I mean when to use 'SparkStandalone' and when to use 'Spark with YARN' ?&lt;/P&gt;</description>
      <pubDate>Wed, 17 Aug 2016 19:44:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106343#M38086</guid>
      <dc:creator>srikanth_ch45</dc:creator>
      <dc:date>2016-08-17T19:44:38Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Standalone Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106344#M38087</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/12497/srikanthch45.html"&gt;RAMESH K&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Use Spark Standalone if you are Spark only shop and you don't care about resource contention with other services from the Hadoop ecosystem.   Your Spark uses all resources of your cluster.&lt;/P&gt;&lt;P&gt;If your Spark is part of Hortonworks Data Platform and Spark shares resources like HDFS, use Spark over YARN. That will allow you to allocate proper resources to Spark and avoid resource contention with other services. You can achieve SLA.&lt;/P&gt;&lt;P&gt;I hope this answer helps.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Aug 2016 09:49:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106344#M38087</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2016-08-18T09:49:19Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Standalone Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106345#M38088</link>
      <description>&lt;P&gt;@ Constantin&lt;/P&gt;&lt;P&gt;So can i say SparkStandalone cluster is good for less number of node cluster(maybe less than 10 nodes) because of the fact that resource management performance decreases if we increase the node count in Spark Standalone cluster mode?&lt;/P&gt;</description>
      <pubDate>Thu, 18 Aug 2016 10:26:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106345#M38088</guid>
      <dc:creator>srikanth_ch45</dc:creator>
      <dc:date>2016-08-18T10:26:55Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Standalone Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106346#M38089</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/12497/srikanthch45.html"&gt;@RAMESH K &lt;/A&gt;&lt;/P&gt;&lt;P&gt;There is no demonstrated correlation to support that statement. It does not matter the number of nodes. It matters more how resources are used. You can say that for a complex environment where multiple applications and users access resources and SLA is important (jobs need to complete by a given time, users expect a response time under x seconds, etc), a resource manager is a must. As such, running Spark over Yarn just makes sense. It is more solid to deliver in a competitive use of resources environment. &lt;/P&gt;</description>
      <pubDate>Fri, 19 Aug 2016 02:49:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106346#M38089</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2016-08-19T02:49:14Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Standalone Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106347#M38090</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/12497/srikanthch45.html"&gt;@RAMESH K&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If the response was helpful, please vote and accept it as the best answer. &lt;/P&gt;</description>
      <pubDate>Wed, 21 Sep 2016 21:28:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-Standalone-Cluster/m-p/106347#M38090</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2016-09-21T21:28:22Z</dc:date>
    </item>
  </channel>
</rss>

