<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Can't get Pyspark interpreter to work on Zeppelin in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138012#M19194</link>
    <description>&lt;P&gt;Sandbox 2.4 has deployed Python 2.6.6 (no idea why) is it caused issues with the PySpark based Zeppelin demo notebooks. The way how to fix it is deploy new Python (Anaconda package etc.), add it into PATH, change PYSPARK_PYTHON in zeppelin-env.sh and also in interpreter settings in Zeppelin notebook ("python" has to be replaced by path to new python eq. /opt/anaconda2/bin/python2.7 etc.)&lt;/P&gt;</description>
    <pubDate>Thu, 17 Mar 2016 21:22:47 GMT</pubDate>
    <dc:creator>jan_rock</dc:creator>
    <dc:date>2016-03-17T21:22:47Z</dc:date>
    <item>
      <title>Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138002#M19184</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I've been trying unsuccessfully to configure the pyspark interpreter on Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter from Zeppelin without issue. Here are the lines which aren't commented out in my zeppelin-env.sh file:&lt;/P&gt;&lt;P&gt; export MASTER=yarn-client &lt;/P&gt;&lt;P&gt;                           
 export ZEPPELIN_PORT=8090 &lt;/P&gt;&lt;P&gt;export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 -Dspark.yarn.queue=default" &lt;/P&gt;&lt;P&gt;
 export SPARK_HOME=/usr/hdp/current/spark-client/ &lt;/P&gt;&lt;P&gt;                         
 export HADOOP_CONF_DIR=/etc/hadoop/conf &lt;/P&gt;&lt;P&gt; export PYSPARK_PYTHON=/usr/bin/python &lt;/P&gt;&lt;P&gt; export PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH&lt;/P&gt;&lt;P&gt;Running a simple pyspark script in the interpreter gives this error:&lt;/P&gt;&lt;PRE&gt;Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname): org.apache.spark.SparkException: 
Error from python worker:
  /usr/bin/python: No module named pyspark
PYTHONPATH was:
  /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar&lt;/PRE&gt;&lt;P&gt;I've tried adding this line to zeppelin-env.sh, which gives the same error above:&lt;/P&gt;&lt;P&gt;export PYTHONPATH=/usr/hdp/current/spark-client/python:/usr/hdp/current/spark-client/python/lib/pyspark.zip:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip&lt;/P&gt;&lt;P&gt;I've tried everything I could find on Google, any advice for debugging or fixing this problem?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Ian&lt;/P&gt;&lt;P&gt;Also, in case it's useful for debugging here are some commands and outputs below:&lt;/P&gt;&lt;P&gt;System.getenv().get("MASTER") &lt;/P&gt;&lt;P&gt;System.getenv().get("SPARK_YARN_JAR") &lt;/P&gt;&lt;P&gt;System.getenv().get("HADOOP_CONF_DIR") &lt;/P&gt;&lt;P&gt;System.getenv().get("JAVA_HOME") &lt;/P&gt;&lt;P&gt;System.getenv().get("SPARK_HOME") &lt;/P&gt;&lt;P&gt;System.getenv().get("PYSPARK_PYTHON") &lt;/P&gt;&lt;P&gt;System.getenv().get("PYTHONPATH") &lt;/P&gt;&lt;P&gt;System.getenv().get("ZEPPELIN_JAVA_OPTS")&lt;/P&gt;&lt;P&gt;
res49: String = yarn-client&lt;/P&gt;&lt;P&gt;
res50: String = null &lt;/P&gt;&lt;P&gt;res51: String = /etc/hadoop/conf&lt;/P&gt;&lt;P&gt;
res52: String = /usr/jdk64/jdk1.7.0_45 &lt;/P&gt;&lt;P&gt;res53: String = /usr/hdp/2.3.2.0-2950/spark &lt;/P&gt;&lt;P&gt;res54: String = /usr/bin/python&lt;/P&gt;&lt;P&gt;
res55: String = /usr/hdp/2.3.2.0-2950/spark/python:/usr/hdp/2.3.2.0-2950/spark/python/build:/usr/hdp/current/spark-client//python/lib/py4j-0.8.2.1-src.zip:/usr/hdp/current/spark-client//python/:/usr/hdp/current/spark-client//python:/usr/hdp/current/spark-client//python/build:/usr/hdp/current/spark-client//python:/usr/hdp/current/spark-client//python/build: &lt;/P&gt;&lt;P&gt;res56: String = -Dhdp.version=2.3.2.0-2950&lt;/P&gt;</description>
      <pubDate>Fri, 12 Feb 2016 03:31:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138002#M19184</guid>
      <dc:creator>rachmaninovquar</dc:creator>
      <dc:date>2016-02-12T03:31:45Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138003#M19185</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/2745/rachmaninovquartet.html" nodeid="2745"&gt;@Ian Maloney&lt;/A&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2745/rachmaninovquartet.html" nodeid="2745"&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://issues.apache.org/jira/browse/SPARK-6411" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-6411&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2745/rachmaninovquartet.html" nodeid="2745"&gt;&lt;/A&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2745/rachmaninovquartet.html" nodeid="2745"&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://mail-archives.apache.org/mod_mbox/spark-user/201406.mbox/%3CCAMJOb8kcGk0PqiOGJu6UoKCeysWCuSW3xwd5wRs8ikpMgD2DAg@mail.gmail.com%3E" target="_blank"&gt;https://mail-archives.apache.org/mod_mbox/spark-user/201406.mbox/%3CCAMJOb8kcGk0PqiOGJu6UoKCeysWCuSW3xwd5wRs8ikpMgD2DAg@mail.gmail.com%3E&lt;/A&gt;&lt;/P&gt;&lt;P&gt;That is because people usually don't package python files into their jars.
For pyspark, however, this will work as long as the jar can be opened and
its contents can be read. In my experience, if I am able to import the
pyspark module by explicitly specifying the PYTHONPATH this way, then I can
run pyspark on YARN without fail.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Feb 2016 06:00:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138003#M19185</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-12T06:00:44Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138004#M19186</link>
      <description>&lt;P&gt;See this tutorial &lt;A href="http://www.makedatauseful.com/python-spark-sql-zeppelin-tutorial/" target="_blank"&gt;http://www.makedatauseful.com/python-spark-sql-zeppelin-tutorial/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 12 Feb 2016 07:00:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138004#M19186</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-12T07:00:01Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138005#M19187</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/140/nsabharwal.html" nodeid="140"&gt;@Neeraj Sabharwal&lt;/A&gt; &lt;/P&gt;&lt;P&gt;The Jira issue and tutorial in your comments are completely unrelated to my issue. I previously found the link to the Apache mail archives. It's about using pyspark on yarn, which I can do via the CLI. The only problem is with Zeppelin. It ignores the pythonpath in zeppelin-env.sh (the pythonpath is the same as in spark-env.sh). &lt;/P&gt;</description>
      <pubDate>Fri, 12 Feb 2016 23:43:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138005#M19187</guid>
      <dc:creator>rachmaninovquar</dc:creator>
      <dc:date>2016-02-12T23:43:00Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138006#M19188</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/140/nsabharwal.html" nodeid="140"&gt;@Neeraj Sabharwal&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I've also tried adding the pythonpath directly in the interpreter configs from the Zeppeling GUI, by creating a variable zeppelin.pyspark.pythonpath. I even tried exporting the PYTHONPATH variable from the Linux CLI. None of these worked. What bothers me, is that the pythonpath is not changing, and I'm always getting the same error shown above.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Feb 2016 23:43:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138006#M19188</guid>
      <dc:creator>rachmaninovquar</dc:creator>
      <dc:date>2016-02-12T23:43:18Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138007#M19189</link>
      <description>&lt;P&gt;There was a bug in Zeppelin, it was fixed by Mina Lee and then committed a day ago.&lt;/P&gt;</description>
      <pubDate>Fri, 26 Feb 2016 03:54:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138007#M19189</guid>
      <dc:creator>rachmaninovquar</dc:creator>
      <dc:date>2016-02-26T03:54:52Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138008#M19190</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2745/rachmaninovquartet.html" nodeid="2745"&gt;@Ian Maloney&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I probably suffers the same, trying to upgrade python from ver 2.6.6 to Anaconda3 Python ver 3.5. This is why I wondered what is the difference between changing zeppelin.pyspark.pythonpath if PYSPARK_PYTHON was already changed in zeppelin-env.sh.&lt;/P&gt;&lt;P&gt;What is more and was mentioned by you, should I also change pythonpath in spark-env.sh? I did not change it before.

Peter&lt;/P&gt;</description>
      <pubDate>Fri, 26 Feb 2016 19:29:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138008#M19190</guid>
      <dc:creator>piotr_kuzmiak</dc:creator>
      <dc:date>2016-02-26T19:29:28Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138009#M19191</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3039/piotrkuzmiak.html" nodeid="3039"&gt;@Piotr Kuźmiak&lt;/A&gt;&lt;/P&gt;&lt;P&gt;What I had to do in order to resolve was clone the latest zeppelin from: &lt;A href="https://github.com/apache/incubator-zeppelin"&gt;https://github.com/apache/incubator-zeppelin&lt;/A&gt;
Build it using maven and then update my zeppelin-env.sh and put the port number I wanted in zeppelin-site.xml&lt;/P&gt;&lt;P&gt;I didn't have to change anything in the Zeppelin GUI. Here is what is set in my zeppelin-env.sh:&lt;/P&gt;&lt;P&gt;export MASTER=yarn-client&lt;/P&gt;&lt;P&gt;export ZEPPELIN_PORT=8090&lt;/P&gt;&lt;P&gt;export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950 -Dspark.yarn.queue=default"&lt;/P&gt;&lt;P&gt;export SPARK_HOME=/usr/hdp/current/spark-client/&lt;/P&gt;&lt;P&gt;export HADOOP_CONF_DIR=/etc/hadoop/conf&lt;/P&gt;&lt;P&gt;export PYSPARK_PYTHON=/usr/bin/python&lt;/P&gt;&lt;P&gt;export PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH&lt;/P&gt;</description>
      <pubDate>Fri, 26 Feb 2016 23:08:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138009#M19191</guid>
      <dc:creator>rachmaninovquar</dc:creator>
      <dc:date>2016-02-26T23:08:12Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138010#M19192</link>
      <description>&lt;P&gt;Grab the latest HDP 2.4 Sandbox. It comes with Spark 1.6 &amp;amp; the python interpreter works in Zeppelin.

Also, see hortonworks.com/hadoop-tutorial/hands-on-tour-of-apache-spark-in-5-minutes/ where pyspark interpreter is used.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Mar 2016 11:14:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138010#M19192</guid>
      <dc:creator>rhryniewicz</dc:creator>
      <dc:date>2016-03-03T11:14:35Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138011#M19193</link>
      <description>&lt;P&gt;Sandbox 2.4 has deployed Python 2.6.6 (no idea why) is it caused issues with the PySpark based Zeppelin demo notebooks. The way how to fix it is deploy new Python (Anaconda package etc.), add it into PATH, change PYSPARK_PYTHON in zeppelin-env.sh and also in interpreter settings in Zeppelin notebook ("python" has to be replaced by path to new python eq. /opt/anaconda2/bin/python2.7 etc.)&lt;/P&gt;</description>
      <pubDate>Thu, 17 Mar 2016 21:22:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138011#M19193</guid>
      <dc:creator>jan_rock</dc:creator>
      <dc:date>2016-03-17T21:22:32Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get Pyspark interpreter to work on Zeppelin</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138012#M19194</link>
      <description>&lt;P&gt;Sandbox 2.4 has deployed Python 2.6.6 (no idea why) is it caused issues with the PySpark based Zeppelin demo notebooks. The way how to fix it is deploy new Python (Anaconda package etc.), add it into PATH, change PYSPARK_PYTHON in zeppelin-env.sh and also in interpreter settings in Zeppelin notebook ("python" has to be replaced by path to new python eq. /opt/anaconda2/bin/python2.7 etc.)&lt;/P&gt;</description>
      <pubDate>Thu, 17 Mar 2016 21:22:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-t-get-Pyspark-interpreter-to-work-on-Zeppelin/m-p/138012#M19194</guid>
      <dc:creator>jan_rock</dc:creator>
      <dc:date>2016-03-17T21:22:47Z</dc:date>
    </item>
  </channel>
</rss>

