Support Questions

anandi · ‎08-29-2016

Zeppelin + PySpark (1.6.* or 2.0.0) - I want to know how I can add Python libraries such as Numpy/Pandas/SKLearn...

Additional question:

If I install Anaconda Python and its repo - How do I need to configure the Zeppelin interpreters so that PySpark works well with the anaconda python repo

myoung · ‎09-02-2016

@Amit Nandi

If you want to use libraries not included in the standard Python distribution, then you have to ensure those libraries are install on every server where the Spark job is going to run.

As you may be aware, using something like Anaconda Python makes that process much easier. To ensure Zeppelin uses that version of Python when you use the Python interpreter, set the zeppelin.python setting to the path to Anaconda.

https://github.com/apache/zeppelin/blob/master/docs/interpreter/python.md

You should set your PYSPARK_DRIVER_PYTHON environment variable so that Spark uses Anaconda. You can get more information here:

https://spark.apache.org/docs/1.6.2/programming-guide.html

View solution in original post

eyouyan · ‎08-31-2016

I met the same issue. Refer

http://stackoverflow.com/questions/39221959/zeppelin-unable-to-import-pandas-numpy-scipy/39254183#39...

myoung · ‎09-02-2016