Created on 12-11-2015 10:25 PM - edited 09-16-2022 02:52 AM
I'm setting the below exports from the shell.
export SPARK_HOME="/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/spark"export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
Running the below code as spark-submit TestPyEnv.py
import osimport sys # Path for spark source folder #os.environ['SPARK_HOME']="/opt/cloudera/parcels/CDH/lib/spark" # Append pyspark to Python Path #sys.path.append("/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/spark/python/")help('modules') try: from pyspark import SparkContext from pyspark import SparkConf print ("Successfully imported Spark Modules") except ImportError as e: print ("Can not import Spark Modules", e) sys.exit(1)
I'm not able to figure out for the life of my why the SparkContext is not working.
('Can not import Spark Modules', ImportError('cannot import name SparkContext',))
Created 12-12-2015 08:21 AM
Created 12-12-2015 11:28 AM
Created 12-12-2015 11:31 AM
Created 12-12-2015 12:05 PM
Here's the contents of /etc/spark/conf/spark-env.sh
## # Generated by Cloudera Manager and should not be modified directly ## SELF="$(cd $(dirname $BASH_SOURCE) && pwd)" if [ -z "$SPARK_CONF_DIR" ]; then export SPARK_CONF_DIR="$SELF" fi export SPARK_HOME=/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/spark export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop ### Path of Spark assembly jar in HDFS export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-''} export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME} if [ -n "$HADOOP_HOME" ]; then LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native fi SPARK_EXTRA_LIB_PATH="" if [ -n "$SPARK_EXTRA_LIB_PATH" ]; then LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPARK_EXTRA_LIB_PATH fi export LD_LIBRARY_PATH export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf} # This is needed to support old CDH versions that use a forked version # of compute-classpath.sh. export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib # Set distribution classpath. This is only used in CDH 5.3 and later. export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt") export SPARK_LOCAL_DIRS=/dev/shm #SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/solr/*:/usr/lib/solr/lib/*"
Created 12-12-2015 12:29 PM
What version of CM are you using, and have you attempted recently to redeploy Spark gateway client configs?
The below is what I have out of the box in CM 5.5:
#!/usr/bin/env bash ## # Generated by Cloudera Manager and should not be modified directly ## SELF="$(cd $(dirname $BASH_SOURCE) && pwd)" if [ -z "$SPARK_CONF_DIR" ]; then export SPARK_CONF_DIR="$SELF" fi export SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/hadoop ### Path of Spark assembly jar in HDFS export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-''} ### Some definitions needed by older versions of CDH. export SPARK_LAUNCH_WITH_SCALA=0 export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib SPARK_PYTHON_PATH="" if [ -n "$SPARK_PYTHON_PATH" ]; then export PYTHONPATH="$PYTHONPATH:$SPARK_PYTHON_PATH" fi export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME} if [ -n "$HADOOP_HOME" ]; then LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native fi SPARK_EXTRA_LIB_PATH="/opt/cloudera/parcels/GPLEXTRAS-5.5.0-1.cdh5.5.0.p0.7/lib/hadoop/lib/native" if [ -n "$SPARK_EXTRA_LIB_PATH" ]; then LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPARK_EXTRA_LIB_PATH fi export LD_LIBRARY_PATH HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf} HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf} if [ -d "$HIVE_CONF_DIR" ]; then HADOOP_CONF_DIR="$HADOOP_CONF_DIR:$HIVE_CONF_DIR" fi export HADOOP_CONF_DIR PYLIB="$SPARK_HOME/python/lib" if [ -f "$PYLIB/pyspark.zip" ]; then PYSPARK_ARCHIVES_PATH= for lib in "$PYLIB"/*.zip; do if [ -n "$PYSPARK_ARCHIVES_PATH" ]; then PYSPARK_ARCHIVES_PATH="$PYSPARK_ARCHIVES_PATH,local:$lib" else PYSPARK_ARCHIVES_PATH="local:$lib" fi done export PYSPARK_ARCHIVES_PATH fi # Set distribution classpath. This is only used in CDH 5.3 and later. export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt")