Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Different Python environments

New Contributor

Currently, we have one big Data Science Python environment and we would like to be able to create multiple environments for different tasks. We build the environment on one machine on the cluster and then distribute it manually to all other nodes. Is there an easy way to ship one environment to all the nodes in a cluster? How can we activate a different environment per notebook without having to create a different interpreter (Spark / Livy) for each environment?

3 REPLIES 3

@Kevin Jacobs

Yes, you can use virtual environments. Please review the following link that describes exactly how to do this:

https://community.hortonworks.com/articles/104947/using-virtualenv-with-pyspark.html

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

@Kevin Jacobs I missed the zeppelin notebook part of the question. I've not tried this with zeppelin notebooks, so I'm not sure if that is possible using zeppelin. I have tried and used the above to submit pyspark apps from command line.

New Contributor

Thank you for your answer. We tried it. It worked using spark-submit on the command line, but it did not work in Zeppelin (in which we configured the config flags for Spark).

What we then tried, was setting the Spark submit options in Ambari (with no luck):

export
SPARK_SUBMIT_OPTIONS="--conf spark.pyspark.virtualenv.enabled=true --conf
spark.pyspark.virtualenv.bin.path=/usr/local/bin/virtualenv --conf
spark.pyspark.virtualenv.python_version=3.6 --conf spark.pyspark.virtualenv.requirements=/home/zeppelin/requirements.txt"
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.