Community Articles

Find and share helpful community-sourced technical articles.
avatar

Environment

  • HDP 2.5.x
  • Ambari 2.4.x

Problem

I need to use anaconda for %livy.pyspark. Now, it is using the default python2.6

%livy.pyspark
import sys
print(sys.path)
-----------------------------------------
['/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/container_e11_1483612761447_100542_01_000001/tmp', u'/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/spark-0e25c417-e8c0-4b60-b167-789dc3293bd7/userFiles-a1d6eced-0de1-4e3d-9bc1-f5aea925915d/py4j-0.9-src.zip', u'/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/spark-0e25c417-e8c0-4b60-b167-789dc3293bd7/userFiles-a1d6eced-0de1-4e3d-9bc1-f5aea925915d/pyspark.zip', u'/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/spark-0e25c417-e8c0-4b60-b167-789dc3293bd7/userFiles-a1d6eced-0de1-4e3d-9bc1-f5aea925915d', '/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/container_e11_1483612761447_100542_01_000001/pyspark.zip', '/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/container_e11_1483612761447_100542_01_000001/py4j-0.9-src.zip', '/usr/lib64/python26.zip', '/usr/lib64/python2.6', '/usr/lib64/python2.6/plat-linux2', '/usr/lib64/python2.6/lib-tk', '/usr/lib64/python2.6/lib-old', '/usr/lib64/python2.6/lib-dynload', '/usr/lib64/python2.6/site-packages', '/usr/lib64/python2.6/site-packages/gtk-2.0', '/usr/lib/python2.6/site-packages']

How can I get this working with python-3.5?

Solution

Based on https://issues.apache.org/jira/browse/ZEPPELIN-1609, there is a new pyspark implemented within livy - %livy.pyspark3. This is delivered in Zeppelin 0.7 that comes with HDP 2.6.

For now, do the following:

> go to Ambari UI -> Spark -> Config -> Advanced livy-env -> content
-> set: export PYSPARK_PYTHON=/opt/anaconda/bin/python - path to new python version
-> set: export PYSPARK_DRIVER_PYTHON=/opt/anaconda/bin/python - path to new python version
> save the changes
> Restart all required services 
7,438 Views
0 Kudos
Comments

livy-env.sh is shared by all the sessions which means one livy instance can only run one version of python. I would recommend user to use spark configuration spark.pyspark.driver.python and spark.pyspark.python in spark2 (HDP 2.6) so that each session can set his own python version. https://issues.apache.org/jira/browse/SPARK-13081