Support Questions

Find answers, ask questions, and share your expertise

best way to install/integrate numpy scikit to an existing zeppelin install on sandbox

avatar

Hello Experts!

I noticed that the default zeppelin install on hdp sandbox does not come with numpy and scikit learn packages. I can install pip manager and install the packages manually, but i want to make sure the zeppelin installation will pickup those packages. Anyone added these packages to their cluster?

This is the error that i am getting:

Traceback (most recent call last):  File "/tmp/zeppelin_pyspark.py", line 162, in <module>    eval(compiledCode)  File "<string>", line 1, in <module>ImportError: No module named numpy

the zeppelin property in ambari zeppelin.install_python_packages is set to false. I tried switching it to true but it does not do anything, its an install only read property i am assuming than.

Thanks!

1 ACCEPTED SOLUTION

avatar
Master Mentor
@azeltov

pyspark required numpy in my case, didn't try it in Zepplin but it wouldn't work in pyspark either if you don't have it on your machine https://github.com/dbist/datamunging

View solution in original post

2 REPLIES 2

avatar
Master Mentor
@azeltov

pyspark required numpy in my case, didn't try it in Zepplin but it wouldn't work in pyspark either if you don't have it on your machine https://github.com/dbist/datamunging

avatar

@Artem Ervits your suggestion worked. This is what i ran to get it to run on my sandbox :

 yum install -y numpy