28798
DISCUSSIONS
102175
MEMBERS
3161
ARTICLES
Created 11-12-2015 12:39 PM
I have a Cloudera 5 distribution with Spark 1.3. I installed IPython 1.2.1 to match with the Python 2.6.6. on the CentOS 6. I followed this tutorial (https://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/comment-page-1/... by placing the 2 files (’00-pyspark-setup.py’ & ‘ipython_notebook_config.py’; with proper Spark Directory listed) in my management node home directory and I SSH’d into the management node and first created the environment variable for SPARK_HOME. Then I launched “ipython notebook –profile=pyspark”. Although after launching Python 2 in Ipython Notebook browser, “from pyspark import SparkConf, SparkContext” and “sc” commands both were not recognized at all and was given import errors.
I tried a different way by first declaring
” export SPARK_HOME=’/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p898.573/lib/spark’
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON_OPTS=”notebook –profile=pyspark” ”
And launching with “pyspark”
This gave me a more promising error. “ImportError: No module named ‘SocketServer'” ….when trying to run “from pyspark.context import SparkContext”
I think that obviously I'm not starting pyspark when I launch Ipython but I'm not sure why? Here is what I have in my '00-pyspark-setup.py' file
<
import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
if not spark_home:
raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))
>