- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
pyspark ImportError: No module named numpy
- Labels:
-
Apache Spark
Created ‎06-02-2016 11:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
File "/opt/mapr/spark/spark-1.6.1/python/lib/pyspark.zip/pyspark/mllib/__init__.py", line 25, in <module>
ImportError: No module named numpy
Created ‎06-02-2016 11:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
numpy is missing here,install numpy using pip install numpy
Created ‎06-02-2016 11:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have already installed numpy and using python console its working fine. I tried to setup PYthon environment variable in spark-env.sh. but did not work.
Created ‎06-02-2016 11:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you running it on spark local, standalone or YARN mode?
Created ‎06-02-2016 11:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
do you have multiple python version of python installed on your machine or your working with python testenv. what is your PYTHONPATH?
Created ‎06-02-2016 12:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nope I have only one python 2.7.5 and
whereis python
python: /usr/bin/python /usr/bin/python2.7 /usr/bin/python2.7-config /usr/lib/python2.7 /usr/lib64/python2.7 /etc/python /usr/include/python2.7 /usr/share/man/man1/python.1.gz
Created ‎09-01-2016 09:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am facing the same problem. I have installed numpy on all the nodes. And I am running it using YARN. In the directory /usr/bin I see python, python2, and python2.7. But only python2.7 is green in the list. echo $PYTHONPATH gave me empty string. Afterwards, I executed export PYTHONPATH=/usr/bin/python2.7 on each node. But still the my job submission exits with 'No module named numpy'. Any help?
Created ‎09-01-2016 11:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
please check the permission of python installation directories and see your current user is having correct permission or not.
Also try to simulate scenarios using root user. I hope using root user it should work.
Created ‎02-10-2019 11:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
as @Bhupendra Mishra indirectly pointed out, ensure to launch pip install numpy
command from a root account (sudo does not suffice) after forcing umask to 022 (umask 022
) so it cascades the rights to Spark (or Zeppelin) User
Also, You have to be aware that you need to have numpy installed on each and every worker, and even the master itself (depending on your component placement)
