Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Pyspark in Spark 1.3 and Ipython

Pyspark in Spark 1.3 and Ipython

New Contributor

I have a Cloudera 5 distribution with Spark 1.3. I installed IPython 1.2.1 to match with the Python 2.6.6. on the CentOS 6. I followed this tutorial ( by placing the 2 files (’’ & ‘’; with proper Spark Directory listed) in my management node home directory and I SSH’d into the management node and first created the environment variable for SPARK_HOME. Then I launched “ipython notebook –profile=pyspark”. Although after launching Python 2 in Ipython Notebook browser, “from pyspark import SparkConf, SparkContext” and “sc” commands both were not recognized at all and was given import errors.

I tried a different way by first declaring
” export SPARK_HOME=’/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p898.573/lib/spark’
export PYSPARK_DRIVER_PYTHON_OPTS=”notebook –profile=pyspark” ”

And launching with “pyspark”

This gave me a more promising error. “ImportError: No module named ‘SocketServer'” ….when trying to run “from pyspark.context import SparkContext”


I think that obviously I'm not starting pyspark when I launch Ipython but I'm not sure why? Here is what I have in my '' file



import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
if not spark_home:
raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/'))
execfile(os.path.join(spark_home, 'python/pyspark/'))