Support Questions

Find answers, ask questions, and share your expertise

Zeppelin + PySpark - Adding Libraries Numpy/Pandas/SKLearn...

avatar
Rising Star

Zeppelin + PySpark (1.6.* or 2.0.0) - I want to know how I can add Python libraries such as Numpy/Pandas/SKLearn...

Additional question:

If I install Anaconda Python and its repo - How do I need to configure the Zeppelin interpreters so that PySpark works well with the anaconda python repo

1 ACCEPTED SOLUTION

avatar
Super Guru

@Amit Nandi

If you want to use libraries not included in the standard Python distribution, then you have to ensure those libraries are install on every server where the Spark job is going to run.

As you may be aware, using something like Anaconda Python makes that process much easier. To ensure Zeppelin uses that version of Python when you use the Python interpreter, set the zeppelin.python setting to the path to Anaconda.

https://github.com/apache/zeppelin/blob/master/docs/interpreter/python.md

You should set your PYSPARK_DRIVER_PYTHON environment variable so that Spark uses Anaconda. You can get more information here:

https://spark.apache.org/docs/1.6.2/programming-guide.html

View solution in original post

3 REPLIES 3

avatar
New Contributor

avatar
Super Guru

@Amit Nandi

If you want to use libraries not included in the standard Python distribution, then you have to ensure those libraries are install on every server where the Spark job is going to run.

As you may be aware, using something like Anaconda Python makes that process much easier. To ensure Zeppelin uses that version of Python when you use the Python interpreter, set the zeppelin.python setting to the path to Anaconda.

https://github.com/apache/zeppelin/blob/master/docs/interpreter/python.md

You should set your PYSPARK_DRIVER_PYTHON environment variable so that Spark uses Anaconda. You can get more information here:

https://spark.apache.org/docs/1.6.2/programming-guide.html

avatar
Explorer

@amit nandi can you provide a step by step instruction on how to install anaconda for HDP ?