Support Questions

Find answers, ask questions, and share your expertise

Configuring CDH cluster with Python 3

avatar
Explorer

Hi All,

 

We are using CDH 5.8.3 community version and we want to add support for Python 3.5+ to our cluster since our research algos need Python 3.5+ in order to run their spark jobs successfully.

 

I know that Cloudera and Anaconda has such parcel to support Python, but this parcel support Python version 2.7.

 

What is the recommended way to enable Python version 3+ on CDH cluster?

 

Best,

 

Eyal

1 ACCEPTED SOLUTION

avatar
Explorer
Hi MKay,

As mentioned in my previous posts the Anaconda parcel for CDH comes only
with Python 2.7 and I could find a free way to get a parcel with Python 3+.

We ended up manually installing the different Python versions we needed by
keeping
different virtual envs for different Python versions.

We executed the following procedure to install python 3.5:

yum install python-pip
curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
python get-pip.py
pip install virtualenv
yum install -y https://centos7.iuscommunity.org/ius-release.rpm
yum install -y python35u python35u-libs python35u-devel python35u-pip
mkdir -p /opt/venv35
cd /opt/venv35
virtualenv venv35 -p python3.5
source venv35/bin/activate

Best, Eyal

View solution in original post

9 REPLIES 9

avatar
Expert Contributor

avatar
Explorer

Isn't there a better option like the Cloudera-Anconda parcel which can be managed using CM?

avatar
Explorer

Hi All,

 

Additional thoughts on the question I asked?

 

Best,

 

Eyal

avatar
Explorer

Hi Divyani,

 

Is the solution you offered the best one (the post you shared is from Sep 2015)?

 

Best,

 

Eyal

avatar
Expert Contributor

Hi Eyal,

 

 

Did you check this post from cloudera for anaconda parcel:

 

http://blog.cloudera.com/blog/2016/02/making-python-on-apache-hadoop-easier-with-anaconda-and-cdh/

 

 

 

avatar
Explorer

Hi Divyani,

 

Thanks for the link you shared but again the Anaconda parcel only comes with Python 2.7 and I need Python 3.5.

 

So if I want to enable Python 3.5 in the cluster what are the best recommanded methods?

What if I want to enable multiple Python versions in the cluster and enable each app to run with its own Python version?

 

Best, Eyal

avatar
Expert Contributor

Hi 

 

Continuum ships Anaconda parcel and Cloudera does not have control on which python version it installs.

 

Please use the OS package management tool to install python 3.5 on the servers in the CDH cluster, once that is done, please follow this doc to set python for your pyspark job:

 

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/spark_python.html#spark_python__secti...

avatar
New Contributor

Can you give some steps and instructions to install Python3.5 or Anaconda package in the CDH cluster? By using the parcel way is not working as expected, the parcel shows the message, distributed, activated, but it is not with python3.5, it is still using python2.7. Please let me know if there is any document to install manually anaconda with python3.5 to cdh cluster through the command line. 

avatar
Explorer
Hi MKay,

As mentioned in my previous posts the Anaconda parcel for CDH comes only
with Python 2.7 and I could find a free way to get a parcel with Python 3+.

We ended up manually installing the different Python versions we needed by
keeping
different virtual envs for different Python versions.

We executed the following procedure to install python 3.5:

yum install python-pip
curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
python get-pip.py
pip install virtualenv
yum install -y https://centos7.iuscommunity.org/ius-release.rpm
yum install -y python35u python35u-libs python35u-devel python35u-pip
mkdir -p /opt/venv35
cd /opt/venv35
virtualenv venv35 -p python3.5
source venv35/bin/activate

Best, Eyal