- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
PySpark : cannot import name SparkContext
Created on ‎12-11-2015 10:25 PM - edited ‎09-16-2022 02:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm setting the below exports from the shell.
export SPARK_HOME="/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/spark"export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
Running the below code as spark-submit TestPyEnv.py
import osimport sys # Path for spark source folder #os.environ['SPARK_HOME']="/opt/cloudera/parcels/CDH/lib/spark" # Append pyspark to Python Path #sys.path.append("/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/spark/python/")help('modules') try: from pyspark import SparkContext from pyspark import SparkConf print ("Successfully imported Spark Modules") except ImportError as e: print ("Can not import Spark Modules", e) sys.exit(1)
I'm not able to figure out for the life of my why the SparkContext is not working.
('Can not import Spark Modules', ImportError('cannot import name SparkContext',))
Created ‎12-12-2015 08:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does the "spark-submit TestPyEnv.py" in a clean default environment throw an error?
Created ‎12-12-2015 11:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎12-12-2015 11:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
P.s. Pro-tip: When using full paths to a file under the parcel, use its symlinks to stay upgrade-compatible, i.e. instead of /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/, simply use /opt/cloudera/parcels/CDH/.
Created ‎12-12-2015 12:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here's the contents of /etc/spark/conf/spark-env.sh
## # Generated by Cloudera Manager and should not be modified directly ## SELF="$(cd $(dirname $BASH_SOURCE) && pwd)" if [ -z "$SPARK_CONF_DIR" ]; then export SPARK_CONF_DIR="$SELF" fi export SPARK_HOME=/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/spark export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop ### Path of Spark assembly jar in HDFS export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-''} export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME} if [ -n "$HADOOP_HOME" ]; then LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native fi SPARK_EXTRA_LIB_PATH="" if [ -n "$SPARK_EXTRA_LIB_PATH" ]; then LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPARK_EXTRA_LIB_PATH fi export LD_LIBRARY_PATH export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf} # This is needed to support old CDH versions that use a forked version # of compute-classpath.sh. export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib # Set distribution classpath. This is only used in CDH 5.3 and later. export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt") export SPARK_LOCAL_DIRS=/dev/shm #SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/solr/*:/usr/lib/solr/lib/*"
Created ‎12-12-2015 12:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What version of CM are you using, and have you attempted recently to redeploy Spark gateway client configs?
The below is what I have out of the box in CM 5.5:
#!/usr/bin/env bash ## # Generated by Cloudera Manager and should not be modified directly ## SELF="$(cd $(dirname $BASH_SOURCE) && pwd)" if [ -z "$SPARK_CONF_DIR" ]; then export SPARK_CONF_DIR="$SELF" fi export SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hadoop ### Path of Spark assembly jar in HDFS export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-''} ### Some definitions needed by older versions of CDH. export SPARK_LAUNCH_WITH_SCALA=0 export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib SPARK_PYTHON_PATH="" if [ -n "$SPARK_PYTHON_PATH" ]; then export PYTHONPATH="$PYTHONPATH:$SPARK_PYTHON_PATH" fi export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME} if [ -n "$HADOOP_HOME" ]; then LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native fi SPARK_EXTRA_LIB_PATH="/opt/cloudera/parcels/GPLEXTRAS-5.5.0-1.cdh5.5.0.p0.7/lib/hadoop/lib/native" if [ -n "$SPARK_EXTRA_LIB_PATH" ]; then LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPARK_EXTRA_LIB_PATH fi export LD_LIBRARY_PATH HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf} HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf} if [ -d "$HIVE_CONF_DIR" ]; then HADOOP_CONF_DIR="$HADOOP_CONF_DIR:$HIVE_CONF_DIR" fi export HADOOP_CONF_DIR PYLIB="$SPARK_HOME/python/lib" if [ -f "$PYLIB/pyspark.zip" ]; then PYSPARK_ARCHIVES_PATH= for lib in "$PYLIB"/*.zip; do if [ -n "$PYSPARK_ARCHIVES_PATH" ]; then PYSPARK_ARCHIVES_PATH="$PYSPARK_ARCHIVES_PATH,local:$lib" else PYSPARK_ARCHIVES_PATH="local:$lib" fi done export PYSPARK_ARCHIVES_PATH fi # Set distribution classpath. This is only used in CDH 5.3 and later. export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt")
