Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

ImportError: No module named numpy

SOLVED Go to solution
Highlighted

ImportError: No module named numpy

Befor I post this issue, we have already readed all the same issue's solutions that we can find.

 

Our cluster is installed with cdh6.2, after install we use HUE to use the cluster. Job is submited via Hue.

 

When spark code need to import numpy,  got error below:

 

Traceback (most recent call last):
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/test.py", line 79, in <module>
    from pyspark.ml.linalg import Vectors
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/python/lib/pyspark.zip/pyspark/ml/__init__.py", line 22, in <module>
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/python/lib/pyspark.zip/pyspark/ml/base.py", line 24, in <module>
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/python/lib/pyspark.zip/pyspark/ml/param/__init__.py", line 26, in <module>
ImportError: No module named numpy

 

We followed office guied to install anaconda parcel,  And setup the Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh 

 

export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python

Setup the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf

 

spark.yarn.appMasterEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python

Also, setup the YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve)

 

PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python

But non of these can help to solve the import issue.

 

Thanks for any help.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: ImportError: No module named numpy

4 REPLIES 4

Re: ImportError: No module named numpy

Contributor

Please check if numpy is actually installed on all of the nodemanagers, if not, install it using below command (for python2.x) :

 

pip install numpy

 

If already installed, let us know the following: 

 

1) Can you execute the same command outside of hue i.e. using Spark2-submit ? Mention the full command here.

2) What spark command you use in Hue?

Re: ImportError: No module named numpy

use command below, the job can be executed successfully.

 

export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export HADOOP_CONF_DIR=/etc/alternatives/hadoop-conf
PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python spark-submit --master yarn --deploy-mode cluster test.py

 

In Hue,  open a spark snippet , select the py file, then run it. And the same code can also be executed in Hue's nodebook with yarn model.

 

temp.png

Re: ImportError: No module named numpy

We installed anaconda vir cdh. which is already actived.

 

tt.png

 

In the below file:

 

/run/cloudera-scm-agent/process/895-spark_on_yarn-SPARK_YARN_HISTORY_SERVER/spark-conf/spark-env.sh

we can see:

 

Screenshot from 2019-05-14 18-27-41.png

Re: ImportError: No module named numpy