Created 08-29-2016 02:07 PM
Zeppelin + PySpark (1.6.* or 2.0.0) - I want to know how I can add Python libraries such as Numpy/Pandas/SKLearn...
Additional question:
If I install Anaconda Python and its repo - How do I need to configure the Zeppelin interpreters so that PySpark works well with the anaconda python repo
Created 09-02-2016 12:57 PM
If you want to use libraries not included in the standard Python distribution, then you have to ensure those libraries are install on every server where the Spark job is going to run.
As you may be aware, using something like Anaconda Python makes that process much easier. To ensure Zeppelin uses that version of Python when you use the Python interpreter, set the zeppelin.python setting to the path to Anaconda.
https://github.com/apache/zeppelin/blob/master/docs/interpreter/python.md
You should set your PYSPARK_DRIVER_PYTHON environment variable so that Spark uses Anaconda. You can get more information here:
Created 08-31-2016 08:15 PM
I met the same issue. Refer
Created 09-02-2016 12:57 PM
If you want to use libraries not included in the standard Python distribution, then you have to ensure those libraries are install on every server where the Spark job is going to run.
As you may be aware, using something like Anaconda Python makes that process much easier. To ensure Zeppelin uses that version of Python when you use the Python interpreter, set the zeppelin.python setting to the path to Anaconda.
https://github.com/apache/zeppelin/blob/master/docs/interpreter/python.md
You should set your PYSPARK_DRIVER_PYTHON environment variable so that Spark uses Anaconda. You can get more information here:
Created 05-30-2018 12:11 PM
@amit nandi can you provide a step by step instruction on how to install anaconda for HDP ?