Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to start pyspark or use numpy on Sandbox 2.4

Highlighted

Unable to start pyspark or use numpy on Sandbox 2.4

New Contributor

I currently have HDP 2.4 which is running Python 2.6.6

When I run the following in Zeppelin:

%pyspark

import numpy

import scipy

import pandas

import matplotlib

I get

failed to start pyspark

sometimes, I get

numpy not found

When I run the following in Zeppelin:

System.getenv().get("MASTER")

System.getenv().get("SPARK_YARN_JAR")

System.getenv().get("HADOOP_CONF_DIR")

System.getenv().get("JAVA_HOME")

System.getenv().get("SPARK_HOME")

System.getenv().get("PYSPARK_PYTHON")

System.getenv().get("PYTHONPATH")

System.getenv().get("ZEPPELIN_JAVA_OPTS")

System.getenv().get("ZEPPELIN_PORT")

I get:

res0: String = yarn-client

res1: String = hdfs:///apps/zeppelin/zeppelin-spark-0.5.5-SNAPSHOT.jar

res2: String = /usr/hdp/current/hadoop-client/conf

res3: String = /usr/lib/jvm/java

res4: String = /usr/hdp/2.4.0.0-169/spark

res5: String = null

res6: String = /usr/hdp/current/spark-client//python/lib/py4j-0.9-src.zip:/usr/hdp/current/spark-client//python/:

res7: String =

-Dhdp.version=2.4.0.0-169

-Dspark.executor.memory=512m

-Dspark.executor.instances=2 -Dspark.yarn.queue=default

res8: String = null

I also see the following directories

/usr/hdp/2.4.0.0-169/spark/python/pyspark/mllib

/usr/hdp/2.4.0.0-169/spark/python/pyspark/ml

When I click on the interpreter option, I see:

spark %spark (default), %pyspark, %sql, %dep

zeppelin.pyspark.python /usr/hdp/2.4.0.0-169/spark/python/pyspark

These are the contents of the zeppelin-env.sh file

export MASTER=yarn-client

export SPARK_YARN_JAR=hdfs:///apps/zeppelin/zeppelin-spark-0.5.5-SNAPSHOT.jar

export HADOOP_CONF_DIR=/etc/hadoop/conf

export JAVA_HOME=/usr/lib/jvm/java

export SPARK_HOME=/usr/hdp/current/spark-client/

#export PYSPARK_PYTHON=

export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip"

export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.4.0.0-169

-Dspark.executor.memory=512m

-Dspark.executor.instances=2 -Dspark.yarn.queue=default"

3 REPLIES 3

Re: Unable to start pyspark or use numpy on Sandbox 2.4

@Juan Cruz

You'd need to install numpy package by using below command :

 yum install numpy

Re: Unable to start pyspark or use numpy on Sandbox 2.4

New Contributor

Thanks, Sandeep. I have already tried that and received:

Loaded plugins: fastestmirror, priorities

Setting up Install Process

Determining fastest mirrors

Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=6&arch=x86_64&repo=os&infra=stock error was

14: PYCURL ERROR 6 - "Couldn't resolve host 'mirrorlist.centos.org'"

Error: Cannot find a valid baseurl for repo: base

Re: Unable to start pyspark or use numpy on Sandbox 2.4