Support Questions

gbzygil1 · ‎12-11-2015

I'm setting the below exports from the shell.

export SPARK_HOME="/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/spark"export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH 
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH

Running the below code as spark-submit TestPyEnv.py

import osimport sys

# Path for spark source folder
#os.environ['SPARK_HOME']="/opt/cloudera/parcels/CDH/lib/spark"

# Append pyspark  to Python Path
#sys.path.append("/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/spark/python/")help('modules')
try:
    from pyspark import SparkContext
    from pyspark import SparkConf

    print ("Successfully imported Spark Modules")

except ImportError as e:
    print ("Can not import Spark Modules", e)    sys.exit(1)

I'm not able to figure out for the life of my why the SparkContext is not working.

('Can not import Spark Modules', ImportError('cannot import name SparkContext',))

Harsh J · ‎12-12-2015

On a parcel installation your PySpark should already be setup to be readily used with spark-submit. Is there a reason you're looking to set the SPARK_HOME and PYTHONPATH variables manually? These are auto-handled by CM for you, via your /etc/spark/conf/spark-env.sh.

Does the "spark-submit TestPyEnv.py" in a clean default environment throw an error?

gbzygil1 · ‎12-12-2015

yes, "spark-submit TestPyEnv.py throws an error in a clean env too and thats why i was trying to see if setting the env variables would help...

Harsh J · ‎12-12-2015

Thank you for trying it out. Could you also post your /etc/spark/conf/spark-env.sh contents here, please?

P.s. Pro-tip: When using full paths to a file under the parcel, use its symlinks to stay upgrade-compatible, i.e. instead of /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/, simply use /opt/cloudera/parcels/CDH/.

gbzygil1 · ‎12-12-2015

Here's the contents of /etc/spark/conf/spark-env.sh

##
# Generated by Cloudera Manager and should not be modified directly
##

SELF="$(cd $(dirname $BASH_SOURCE) && pwd)"
if [ -z "$SPARK_CONF_DIR" ]; then
  export SPARK_CONF_DIR="$SELF"
fi

export SPARK_HOME=/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/spark
export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop

### Path of Spark assembly jar in HDFS
export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-''}

export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME}

if [ -n "$HADOOP_HOME" ]; then
  LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi

SPARK_EXTRA_LIB_PATH=""
if [ -n "$SPARK_EXTRA_LIB_PATH" ]; then
  LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPARK_EXTRA_LIB_PATH
fi

export LD_LIBRARY_PATH
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf}

# This is needed to support old CDH versions that use a forked version
# of compute-classpath.sh.
export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib

# Set distribution classpath. This is only used in CDH 5.3 and later.
export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt")
export SPARK_LOCAL_DIRS=/dev/shm
#SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/solr/*:/usr/lib/solr/lib/*"

Harsh J · ‎12-12-2015

What version of CM are you using, and have you attempted recently to redeploy Spark gateway client configs?

The below is what I have out of the box in CM 5.5:

#!/usr/bin/env bash
##
# Generated by Cloudera Manager and should not be modified directly
##

SELF="$(cd $(dirname $BASH_SOURCE) && pwd)"
if [ -z "$SPARK_CONF_DIR" ]; then
  export SPARK_CONF_DIR="$SELF"
fi

export SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark
export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hadoop

### Path of Spark assembly jar in HDFS
export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-''}

### Some definitions needed by older versions of CDH.
export SPARK_LAUNCH_WITH_SCALA=0
export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib
export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib

SPARK_PYTHON_PATH=""
if [ -n "$SPARK_PYTHON_PATH" ]; then
  export PYTHONPATH="$PYTHONPATH:$SPARK_PYTHON_PATH"
fi

export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME}

if [ -n "$HADOOP_HOME" ]; then
  LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi

SPARK_EXTRA_LIB_PATH="/opt/cloudera/parcels/GPLEXTRAS-5.5.0-1.cdh5.5.0.p0.7/lib/hadoop/lib/native"
if [ -n "$SPARK_EXTRA_LIB_PATH" ]; then
  LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPARK_EXTRA_LIB_PATH
fi

export LD_LIBRARY_PATH

HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf}
HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf}
if [ -d "$HIVE_CONF_DIR" ]; then
  HADOOP_CONF_DIR="$HADOOP_CONF_DIR:$HIVE_CONF_DIR"
fi
export HADOOP_CONF_DIR

PYLIB="$SPARK_HOME/python/lib"
if [ -f "$PYLIB/pyspark.zip" ]; then
  PYSPARK_ARCHIVES_PATH=
  for lib in "$PYLIB"/*.zip; do
    if [ -n "$PYSPARK_ARCHIVES_PATH" ]; then
      PYSPARK_ARCHIVES_PATH="$PYSPARK_ARCHIVES_PATH,local:$lib"
    else
      PYSPARK_ARCHIVES_PATH="local:$lib"
    fi
  done
  export PYSPARK_ARCHIVES_PATH
fi

# Set distribution classpath. This is only used in CDH 5.3 and later.
export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt")

Cloudera Community

Support Questions

PySpark : cannot import name SparkContext