Support Questions

Find answers, ask questions, and share your expertise

How could I use pandas library in Pyspark in ZEPPELIN?

avatar
Contributor

Hi everyone!

I am working with the Pyspark interpreter in a Zeppelin notebook, and I want to use "pandas" library functionalites, but when I try this command:

import pandas as pd

I get the next error message:

Traceback (most recent call last): File "/tmp/zeppelin_pyspark-2633231603377305574.py", line 239, in <module> eval(compiledCode) File "<string>", line 1, in <module> ImportError: No module named pandas

I have already installed Pandas in my Virtual Machine where Zeppelin is running, and restart ambari-server as it's explained in the next post:

http://stackoverflow.com/questions/39221959/zeppelin-unable-to-import-pandas-numpy-scipy/39254183#39...

How could I do?

1 ACCEPTED SOLUTION

avatar

You might need to restart the Spark Interpreter (or restart Zeppelin notebook in Ambari, so that the Python Remote Interpreters know about the freshly installed pandas and import it

If you are you running on a cluster, then Zeppelin will run in yarn client mode and the Python Remote Interpreters are started on other nodes than the zeppelin node. In this case install pandas on all machines of your cluster and restart Zeppelin.

View solution in original post

1 REPLY 1

avatar

You might need to restart the Spark Interpreter (or restart Zeppelin notebook in Ambari, so that the Python Remote Interpreters know about the freshly installed pandas and import it

If you are you running on a cluster, then Zeppelin will run in yarn client mode and the Python Remote Interpreters are started on other nodes than the zeppelin node. In this case install pandas on all machines of your cluster and restart Zeppelin.