Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

best way to install/integrate numpy scikit to an existing zeppelin install on sandbox

avatar

Hello Experts!

I noticed that the default zeppelin install on hdp sandbox does not come with numpy and scikit learn packages. I can install pip manager and install the packages manually, but i want to make sure the zeppelin installation will pickup those packages. Anyone added these packages to their cluster?

This is the error that i am getting:

Traceback (most recent call last):  File "/tmp/zeppelin_pyspark.py", line 162, in <module>    eval(compiledCode)  File "<string>", line 1, in <module>ImportError: No module named numpy

the zeppelin property in ambari zeppelin.install_python_packages is set to false. I tried switching it to true but it does not do anything, its an install only read property i am assuming than.

Thanks!

1 ACCEPTED SOLUTION

avatar
Master Mentor
@azeltov

pyspark required numpy in my case, didn't try it in Zepplin but it wouldn't work in pyspark either if you don't have it on your machine https://github.com/dbist/datamunging

View solution in original post

2 REPLIES 2

avatar
Master Mentor
@azeltov

pyspark required numpy in my case, didn't try it in Zepplin but it wouldn't work in pyspark either if you don't have it on your machine https://github.com/dbist/datamunging

avatar

@Artem Ervits your suggestion worked. This is what i ran to get it to run on my sandbox :

 yum install -y numpy