Created 07-02-2018 11:23 AM
I ve check the list of interpreters that are installed on my zeppelin, and I found out that python doesn't belong to the list. now for use python command I use %spark.pyspark.
I would know if it's a good idea to use pyspark instead of python, and is it recommanded to have python interpreted even if I have pyspark which works fine for python code?
Created 07-02-2018 08:41 PM
Yes, psypark interpreter can be used to run python. However the application will automatically have reference to spark libraries. Also note pyspark interpreter launches a yarn application and by default this is configured to run with 2 executors - This means you will see an application master + 2 containers for the running pyspark interpreter.
If you are not really making any use of spark and only write code that does not need to be run in cluster perhaps you should consider installing just the python interpreter.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 07-02-2018 08:41 PM
Yes, psypark interpreter can be used to run python. However the application will automatically have reference to spark libraries. Also note pyspark interpreter launches a yarn application and by default this is configured to run with 2 executors - This means you will see an application master + 2 containers for the running pyspark interpreter.
If you are not really making any use of spark and only write code that does not need to be run in cluster perhaps you should consider installing just the python interpreter.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 07-03-2018 09:44 AM
What about if I want to use PANDAS and Matplotlib, should I use Pyspark?
Created 07-03-2018 11:31 AM
@Yassine Yes, you could use Pandas and Matplotlib along with pyspark. For example you could use spark api to read data from cluster in parallel, process the data and then you could transform the spark dataframe to pandas and use matplotlib to show the results. There are other interactions but I think this may be the most common one I've seen.