<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Graphframes with pyspark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128677#M43417</link>
    <description>&lt;P&gt;We are trying to use graphframes package with pyspark. For some reason it doesn't work in our production environment. In our dev environment it works as we can use --packages options and it downloads the libraries from external repository. We cannot use packages option in production as it is not connected to the internet.  It works with scala in production. &lt;/P&gt;&lt;P&gt;The default  python version is 2.6.6 and hdp version is 2.4.2&lt;/P&gt;&lt;P&gt;pyspark --packages graphframes:graphframes:0.2.0-spark1.6-s_2.10&lt;/P&gt;&lt;P&gt;I copied the all the jars downloaded with --packages option in dev and passed it as parameter to --jars in pyspark command in production. But it doesn't work. The same commands work in dev and spark on my mac. &lt;/P&gt;&lt;P&gt;pyspark --py-files /tmp/thirdpartyjars/graphframes_graphframes-0.2.0-spark1.6-s_2.10.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-slf4j_2.10-2.1.2.jar,/tmp/thirdpartyjars/org.scala-lang_scala-reflect-2.10.4.jar,/tmp/thirdpartyjars/org.slf4j_slf4j-api-1.7.7.jar	--jars /tmp/thirdpartyjars/graphframes_graphframes-0.2.0-spark1.6-s_2.10.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-slf4j_2.10-2.1.2.jar,/tmp/thirdpartyjars/org.scala-lang_scala-reflect-2.10.4.jar,/tmp/thirdpartyjars/org.slf4j_slf4j-api-1.7.7.jar&lt;/P&gt;&lt;P&gt;Console log 
Using Python version 2.6.6 (r266:84292, May 22 2015 08:34:51)
SparkContext available as sc, HiveContext available as sqlContext.
&amp;gt;&amp;gt;&amp;gt; 
&amp;gt;&amp;gt;&amp;gt; from graphframes import *
Traceback (most recent call last):
File "&amp;lt;stdin&amp;gt;", line 1, in &amp;lt;module&amp;gt;
zipimport.ZipImportError: can't find module 'graphframes'&lt;/P&gt;</description>
    <pubDate>Thu, 13 Oct 2016 18:06:12 GMT</pubDate>
    <dc:creator>deepak.subhramanian</dc:creator>
    <dc:date>2016-10-13T18:06:12Z</dc:date>
    <item>
      <title>Graphframes with pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128677#M43417</link>
      <description>&lt;P&gt;We are trying to use graphframes package with pyspark. For some reason it doesn't work in our production environment. In our dev environment it works as we can use --packages options and it downloads the libraries from external repository. We cannot use packages option in production as it is not connected to the internet.  It works with scala in production. &lt;/P&gt;&lt;P&gt;The default  python version is 2.6.6 and hdp version is 2.4.2&lt;/P&gt;&lt;P&gt;pyspark --packages graphframes:graphframes:0.2.0-spark1.6-s_2.10&lt;/P&gt;&lt;P&gt;I copied the all the jars downloaded with --packages option in dev and passed it as parameter to --jars in pyspark command in production. But it doesn't work. The same commands work in dev and spark on my mac. &lt;/P&gt;&lt;P&gt;pyspark --py-files /tmp/thirdpartyjars/graphframes_graphframes-0.2.0-spark1.6-s_2.10.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-slf4j_2.10-2.1.2.jar,/tmp/thirdpartyjars/org.scala-lang_scala-reflect-2.10.4.jar,/tmp/thirdpartyjars/org.slf4j_slf4j-api-1.7.7.jar	--jars /tmp/thirdpartyjars/graphframes_graphframes-0.2.0-spark1.6-s_2.10.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-slf4j_2.10-2.1.2.jar,/tmp/thirdpartyjars/org.scala-lang_scala-reflect-2.10.4.jar,/tmp/thirdpartyjars/org.slf4j_slf4j-api-1.7.7.jar&lt;/P&gt;&lt;P&gt;Console log 
Using Python version 2.6.6 (r266:84292, May 22 2015 08:34:51)
SparkContext available as sc, HiveContext available as sqlContext.
&amp;gt;&amp;gt;&amp;gt; 
&amp;gt;&amp;gt;&amp;gt; from graphframes import *
Traceback (most recent call last):
File "&amp;lt;stdin&amp;gt;", line 1, in &amp;lt;module&amp;gt;
zipimport.ZipImportError: can't find module 'graphframes'&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 18:06:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128677#M43417</guid>
      <dc:creator>deepak.subhramanian</dc:creator>
      <dc:date>2016-10-13T18:06:12Z</dc:date>
    </item>
    <item>
      <title>Re: Graphframes with pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128678#M43418</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3696/deepaksubhramanian-1.html" nodeid="3696"&gt;@Deepak Subhramanian&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I'd recommend upgrading your python version to 2.7 or higher (preferably &lt;A href="https://www.continuum.io/downloads"&gt;Anaconda&lt;/A&gt;). &lt;/P&gt;&lt;P&gt;I was able to recreate your error, and it was resolved when I upgraded from 2.6 to python Anaconda 2.7. Let me know if this does the trick for you!&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 20:13:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128678#M43418</guid>
      <dc:creator>dzaratsian</dc:creator>
      <dc:date>2016-10-13T20:13:12Z</dc:date>
    </item>
    <item>
      <title>Re: Graphframes with pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128679#M43419</link>
      <description>&lt;P&gt;Thanks Dan. It works in our dev environment which is on python 2.6.6. When I expanded the graphframes jar and ran pyspark from graphframes directory I was getting the "Bad magic number" error which relates to version mismatch.  But since it worked in our dev environment which is 2.6 I think it is possible to get it working with 2.6. I am not sure --packages option did something extra to the python packages after downloading to make it working with python 2.6&lt;/P&gt;&lt;P&gt;We are looking at getting Anaconda in our cluster . But it will take time to upgrade as some process is involved in the production environment to make sure the python upgrade doesn't affect ambari and the cluster. &lt;/P&gt;&lt;P&gt;from graphframes import *&lt;/P&gt;&lt;P&gt;Traceback (most recent call last):&lt;/P&gt;&lt;P&gt;  File "&amp;lt;stdin&amp;gt;", line 1, in &amp;lt;module&amp;gt;&lt;/P&gt;&lt;P&gt;ImportError: Bad magic number in graphframes/__init__.pyc&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/9368/ambari-server-install-requires-python26.html"&gt;https://community.hortonworks.com/questions/9368/ambari-server-install-requires-python26.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 20:41:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128679#M43419</guid>
      <dc:creator>deepak.subhramanian</dc:creator>
      <dc:date>2016-10-13T20:41:56Z</dc:date>
    </item>
    <item>
      <title>Re: Graphframes with pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128680#M43420</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3696/deepaksubhramanian-1.html" nodeid="3696"&gt;@Deepak Subhramanian&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I got this to work in python 2.6 by following these steps:&lt;/P&gt;&lt;P&gt;1.) Download the graphframes-0.2.0-spark1.6-s_2.10.zip file from &lt;A href="https://spark-packages.org/package/graphframes/graphframes"&gt;here&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2.) Download the graphframes-0.2.0-spark1.6-s_2.10.jar file from &lt;A href="https://spark-packages.org/package/graphframes/graphframes"&gt;here&lt;/A&gt;&lt;/P&gt;&lt;P&gt;3.) Unzip graphframes-0.2.0-spark1.6-s_2.10.zip&lt;/P&gt;&lt;P&gt;4.) Navigate to the python directory: &lt;/P&gt;&lt;PRE&gt;cd ./graphframes-0.2.0-spark1.6-s_2.10/python
&lt;/PRE&gt;&lt;P&gt;5.) Zip up the contents contained within this directory:&lt;/P&gt;&lt;PRE&gt;zip mypyfiles.zip * -r&lt;/PRE&gt;&lt;P&gt;6. Launch pyspark:&lt;/P&gt;&lt;PRE&gt;./bin/pyspark --py-files mypyfiles.zip --jars graphframes-0.2.0-spark1.6-s_2.10.jar
 
&lt;/PRE&gt;&lt;P&gt;Give that a shot - let me know how it goes. &lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 23:23:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128680#M43420</guid>
      <dc:creator>dzaratsian</dc:creator>
      <dc:date>2016-10-13T23:23:09Z</dc:date>
    </item>
    <item>
      <title>Re: Graphframes with pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128681#M43421</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11915/dzaratsian.html" nodeid="11915"&gt;@Dan Zaratsian&lt;/A&gt; &lt;/P&gt;&lt;P&gt;That worked. Thanks a lot.  &lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 00:18:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Graphframes-with-pyspark/m-p/128681#M43421</guid>
      <dc:creator>deepak.subhramanian</dc:creator>
      <dc:date>2016-10-14T00:18:21Z</dc:date>
    </item>
  </channel>
</rss>

