Support Questions
Find answers, ask questions, and share your expertise

Pyspark in Spark 1.3 and Ipython

New Contributor

I have a Cloudera 5 distribution with Spark 1.3. I installed IPython 1.2.1 to match with the Python 2.6.6. on the CentOS 6. I followed this tutorial ( by placing the 2 files (’’ & ‘’; with proper Spark Directory listed) in my management node home directory and I SSH’d into the management node and first created the environment variable for SPARK_HOME. Then I launched “ipython notebook –profile=pyspark”. Although after launching Python 2 in Ipython Notebook browser, “from pyspark import SparkConf, SparkContext” and “sc” commands both were not recognized at all and was given import errors.

I tried a different way by first declaring
” export SPARK_HOME=’/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p898.573/lib/spark’
export PYSPARK_DRIVER_PYTHON_OPTS=”notebook –profile=pyspark” ”

And launching with “pyspark”

This gave me a more promising error. “ImportError: No module named ‘SocketServer'” ….when trying to run “from pyspark.context import SparkContext”


I think that obviously I'm not starting pyspark when I launch Ipython but I'm not sure why? Here is what I have in my '' file



import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
if not spark_home:
raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/'))
execfile(os.path.join(spark_home, 'python/pyspark/'))