Support Questions

Find answers, ask questions, and share your expertise

How could I use pandas library in Pyspark in ZEPPELIN?

avatar
New Member

Hi everyone!

I am working with the Pyspark interpreter in a Zeppelin notebook, and I want to use "pandas" library functionalites, but when I try this command:

import pandas as pd

I get the next error message:

Traceback (most recent call last): File "/tmp/zeppelin_pyspark-2633231603377305574.py", line 239, in <module> eval(compiledCode) File "<string>", line 1, in <module> ImportError: No module named pandas

I have already installed Pandas in my Virtual Machine where Zeppelin is running, and restart ambari-server as it's explained in the next post:

http://stackoverflow.com/questions/39221959/zeppelin-unable-to-import-pandas-numpy-scipy/39254183#39...

How could I do?

1 ACCEPTED SOLUTION

avatar

You might need to restart the Spark Interpreter (or restart Zeppelin notebook in Ambari, so that the Python Remote Interpreters know about the freshly installed pandas and import it

If you are you running on a cluster, then Zeppelin will run in yarn client mode and the Python Remote Interpreters are started on other nodes than the zeppelin node. In this case install pandas on all machines of your cluster and restart Zeppelin.

View solution in original post

3 REPLIES 3

avatar

You might need to restart the Spark Interpreter (or restart Zeppelin notebook in Ambari, so that the Python Remote Interpreters know about the freshly installed pandas and import it

If you are you running on a cluster, then Zeppelin will run in yarn client mode and the Python Remote Interpreters are started on other nodes than the zeppelin node. In this case install pandas on all machines of your cluster and restart Zeppelin.

avatar
Explorer

I am using Cloudera CDP 7.2.18... Where to install the library? I have installed on the nodes and restarted Zeppelin service but still cannot use for example Numpy...

avatar
Community Manager

@LSIMS As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: