<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: SPARK PYSPARK SPARKR : Question Versionning in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/SPARK-PYSPARK-SPARKR-Question-Versionning/m-p/104054#M66951</link>
    <description>&lt;P&gt;&lt;STRONG&gt;For Python:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I'd recommend installing python &lt;A href="https://www.continuum.io/downloads"&gt;Anaconda 2.7&lt;/A&gt; on all nodes of your cluster. If your developer would like to manually add python files/scripts, he can use the &lt;A href="http://spark.apache.org/docs/1.6.2/submitting-applications.html"&gt;--py-files&lt;/A&gt; argument as part of the spark-submit statement. As an alternative, you can also reference python scripts/files from within your pyspark code using &lt;A href="http://spark.apache.org/docs/latest/api/python/pyspark.html"&gt;addPyFile&lt;/A&gt;, such as sc.addPyFile("mymodule.py"). Just as an FYI, PySpark will run fine if you have python 2.6 installed, but you will just not be able to use the more recent packages.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;For R:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;As &lt;A rel="user" href="https://community.cloudera.com/users/36/lgeorge.html" nodeid="36"&gt;@lgeorge&lt;/A&gt; mentioned, you will want to install R (and all required packages) to each node of your cluster. Also make sure your JAVA_HOME environment variable is set, then you should be able to launch SparkR. &lt;/P&gt;</description>
    <pubDate>Fri, 30 Sep 2016 00:37:38 GMT</pubDate>
    <dc:creator>dzaratsian</dc:creator>
    <dc:date>2016-09-30T00:37:38Z</dc:date>
    <item>
      <title>SPARK PYSPARK SPARKR : Question Versionning</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-PYSPARK-SPARKR-Question-Versionning/m-p/104052#M66949</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I'm not developper, I'm admin for Hadoop plateform.&lt;/P&gt;&lt;P&gt;We have intalled HDP 2.4.2 so with SPARK 1.6.1 my questions concerning Versionning about Python and R.&lt;/P&gt;&lt;P&gt;All my servers are installed with Centos 6.8 with python 2.6.6 so it is possible to use PySpark ?&lt;/P&gt;&lt;P&gt;My developper said his wants python 2.7.X I don't know why. If I need to install Python 2.7 or 3 is need to install on all plateform or just in one datanode or master?&lt;/P&gt;&lt;P&gt;SparkR needs R to install, it is not ship with Spark ?&lt;/P&gt;&lt;P&gt;thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Sep 2016 21:17:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-PYSPARK-SPARKR-Question-Versionning/m-p/104052#M66949</guid>
      <dc:creator>maykiwogno</dc:creator>
      <dc:date>2016-09-29T21:17:17Z</dc:date>
    </item>
    <item>
      <title>Re: SPARK PYSPARK SPARKR : Question Versionning</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-PYSPARK-SPARKR-Question-Versionning/m-p/104053#M66950</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10363/maykiwogno.html" nodeid="10363"&gt;@mayki wogno&lt;/A&gt;, regarding your last question--I believe you need to install R separately before using it with Spark/SparkR. There is additional info in our HDP 2.5.0 documentation (SparkR is in tech preview until HDP 2.5); see &lt;A href="http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/ch_spark-r.html" target="_blank"&gt;http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/ch_spark-r.html&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Sep 2016 22:06:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-PYSPARK-SPARKR-Question-Versionning/m-p/104053#M66950</guid>
      <dc:creator>lgeorge</dc:creator>
      <dc:date>2016-09-29T22:06:50Z</dc:date>
    </item>
    <item>
      <title>Re: SPARK PYSPARK SPARKR : Question Versionning</title>
      <link>https://community.cloudera.com/t5/Support-Questions/SPARK-PYSPARK-SPARKR-Question-Versionning/m-p/104054#M66951</link>
      <description>&lt;P&gt;&lt;STRONG&gt;For Python:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I'd recommend installing python &lt;A href="https://www.continuum.io/downloads"&gt;Anaconda 2.7&lt;/A&gt; on all nodes of your cluster. If your developer would like to manually add python files/scripts, he can use the &lt;A href="http://spark.apache.org/docs/1.6.2/submitting-applications.html"&gt;--py-files&lt;/A&gt; argument as part of the spark-submit statement. As an alternative, you can also reference python scripts/files from within your pyspark code using &lt;A href="http://spark.apache.org/docs/latest/api/python/pyspark.html"&gt;addPyFile&lt;/A&gt;, such as sc.addPyFile("mymodule.py"). Just as an FYI, PySpark will run fine if you have python 2.6 installed, but you will just not be able to use the more recent packages.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;For R:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;As &lt;A rel="user" href="https://community.cloudera.com/users/36/lgeorge.html" nodeid="36"&gt;@lgeorge&lt;/A&gt; mentioned, you will want to install R (and all required packages) to each node of your cluster. Also make sure your JAVA_HOME environment variable is set, then you should be able to launch SparkR. &lt;/P&gt;</description>
      <pubDate>Fri, 30 Sep 2016 00:37:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/SPARK-PYSPARK-SPARKR-Question-Versionning/m-p/104054#M66951</guid>
      <dc:creator>dzaratsian</dc:creator>
      <dc:date>2016-09-30T00:37:38Z</dc:date>
    </item>
  </channel>
</rss>

