Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Suggestions on Distributing and activating anaconda

Highlighted

Suggestions on Distributing and activating anaconda

Expert Contributor

Hi guys,

I have installed and built anaconda virtual environment on a node outside of the HDP cluster. To use this with Spark, we need to have this on the HDP cluster. Around this I have a couple of questions.

1) Do we need to install Anaconda on all the nodes? We will possibly try to avoid this as we do not have internet access from cluster and Anaconda installation will require download of the libraries while installation. Did not find repos officially supported for linux installations.

2) If we need to distribute the environment by copying in to all the the nodes before starting any spark applications, then while submitting the spark job from edge node, how do we make sure Spark job uses the Anaconda virtual environment? ( On the single node it is easy as we can switch the Anaconda environments)

Thanks,

SS

3 REPLIES 3

Re: Suggestions on Distributing and activating anaconda

New Contributor

Hi,

I am also working in a similar task.

Till now I am using Zeppelin and I configured a python interpreter to work with anaconda3 and not with the python including in the HDP 2.5

41542-python-interpreter.png

Next thing I did was change these Zeppelin settings in zeppelin-env.sh:

export PYTHONPATH="/opt/anaconda3/bin/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip"
export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}"
export PYSPARK_DRIVER_PYTHON="/opt/anaconda3/bin/ipython"
export PYSPARK_PYTHON="/opt/anaconda3/bin/python"
export PYLIB="${SPARK_HOME}/python/lib"

But I think I have a conceptual problem and I am not understading how everything works. With these settings the %spark2.pyspark interpreter is not working as expected and doesn't find packages like panda.

My next try was to modified the spark2 interpreter as follows:

zeppelin.pyspark.python = /opt/anaconda3/bin/python

But with this the pyspark interpreter is not responding any longer.

Any specialist who can explain us how every component interact with each other?

I would really appreciated.

Kind regards,

Paul

Re: Suggestions on Distributing and activating anaconda

New Contributor

Anyone find an answer to this? We're going to need to do something similar....

Re: Suggestions on Distributing and activating anaconda

New Contributor

Hi @vromeo

I raised this question on stack overflow and received an acceptable answer : https://stackoverflow.com/questions/47198678/zeppelin-python-conda-and-python-sql-interpreters-do-no...

Kind regards,

Paul

Don't have an account?
Coming from Hortonworks? Activate your account here