- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 02-09-2017 04:34 PM - edited 09-16-2022 01:38 AM
Environment
- HDP 2.5.x
- Ambari 2.4.x
Problem
I need to use anaconda for %livy.pyspark. Now, it is using the default python2.6
%livy.pyspark import sys print(sys.path) ----------------------------------------- ['/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/container_e11_1483612761447_100542_01_000001/tmp', u'/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/spark-0e25c417-e8c0-4b60-b167-789dc3293bd7/userFiles-a1d6eced-0de1-4e3d-9bc1-f5aea925915d/py4j-0.9-src.zip', u'/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/spark-0e25c417-e8c0-4b60-b167-789dc3293bd7/userFiles-a1d6eced-0de1-4e3d-9bc1-f5aea925915d/pyspark.zip', u'/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/spark-0e25c417-e8c0-4b60-b167-789dc3293bd7/userFiles-a1d6eced-0de1-4e3d-9bc1-f5aea925915d', '/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/container_e11_1483612761447_100542_01_000001/pyspark.zip', '/var/hadoop/b/yarn/local/usercache/<user>/appcache/application_1483612761447_100542/container_e11_1483612761447_100542_01_000001/py4j-0.9-src.zip', '/usr/lib64/python26.zip', '/usr/lib64/python2.6', '/usr/lib64/python2.6/plat-linux2', '/usr/lib64/python2.6/lib-tk', '/usr/lib64/python2.6/lib-old', '/usr/lib64/python2.6/lib-dynload', '/usr/lib64/python2.6/site-packages', '/usr/lib64/python2.6/site-packages/gtk-2.0', '/usr/lib/python2.6/site-packages']
How can I get this working with python-3.5?
Solution
Based on https://issues.apache.org/jira/browse/ZEPPELIN-1609, there is a new pyspark implemented within livy - %livy.pyspark3. This is delivered in Zeppelin 0.7 that comes with HDP 2.6.
For now, do the following:
> go to Ambari UI -> Spark -> Config -> Advanced livy-env -> content -> set: export PYSPARK_PYTHON=/opt/anaconda/bin/python - path to new python version -> set: export PYSPARK_DRIVER_PYTHON=/opt/anaconda/bin/python - path to new python version > save the changes > Restart all required services
Created on 02-13-2017 01:38 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
livy-env.sh is shared by all the sessions which means one livy instance can only run one version of python. I would recommend user to use spark configuration spark.pyspark.driver.python and spark.pyspark.python in spark2 (HDP 2.6) so that each session can set his own python version. https://issues.apache.org/jira/browse/SPARK-13081