Support Questions

Find answers, ask questions, and share your expertise

Python interpreter not configured in Zeppelin

avatar
Expert Contributor

I ve check the list of interpreters that are installed on my zeppelin, and I found out that python doesn't belong to the list. now for use python command I use %spark.pyspark.

I would know if it's a good idea to use pyspark instead of python, and is it recommanded to have python interpreted even if I have pyspark which works fine for python code?

1 ACCEPTED SOLUTION

avatar

@Yassine

Yes, psypark interpreter can be used to run python. However the application will automatically have reference to spark libraries. Also note pyspark interpreter launches a yarn application and by default this is configured to run with 2 executors - This means you will see an application master + 2 containers for the running pyspark interpreter.

If you are not really making any use of spark and only write code that does not need to be run in cluster perhaps you should consider installing just the python interpreter.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

3 REPLIES 3

avatar

@Yassine

Yes, psypark interpreter can be used to run python. However the application will automatically have reference to spark libraries. Also note pyspark interpreter launches a yarn application and by default this is configured to run with 2 executors - This means you will see an application master + 2 containers for the running pyspark interpreter.

If you are not really making any use of spark and only write code that does not need to be run in cluster perhaps you should consider installing just the python interpreter.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar
Expert Contributor

What about if I want to use PANDAS and Matplotlib, should I use Pyspark?

avatar

@Yassine Yes, you could use Pandas and Matplotlib along with pyspark. For example you could use spark api to read data from cluster in parallel, process the data and then you could transform the spark dataframe to pandas and use matplotlib to show the results. There are other interactions but I think this may be the most common one I've seen.