Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How could I use pandas, numpy and scikit-learn in Centos 6 with Python 2.6.6?

How could I use pandas, numpy and scikit-learn in Centos 6 with Python 2.6.6?

Explorer

Hi everyone!

I have an HDP 2.5 installed on my cluster with an Operating System CentOS 6.5. The requirements established in Hortonworks official documentation tell that I need to use a Python 2.6 version to work, but with that version I cannot install the libraries numpy, scikit-learn and pandas.

I tried to change the Python version to Python 2.7 but then I got an error on the cluster when executing "yum" command.

I also tried installing Python2.7 version, and not changong system Python version to it, but specifying Python 2.7 is the version to use in the ZEPPELIN pyspark interpreter. I got an error saying the worker nodes are not using the same Python version.

So my question is: is there any way to work in Hortonworks ZEPPELIN with pandas, numpy and scikit-learn, with CentOS 6.5 cluster?

Thank you so much!

1 REPLY 1

Re: How could I use pandas, numpy and scikit-learn in Centos 6 with Python 2.6.6?

@pfctic2 pfctic2

My typical recommendation is to use Anaconda Python (https://www.continuum.io/downloads) which provides those libraries and more out of the box. Anaconda, when installed, does not replace the standard CentOS Python which avoids the problems you are seeing.

See my answer here on how to get Zeppelin and Spark to use Anaconda: https://community.hortonworks.com/questions/53793/zeppelin-pyspark-adding-libraries-numpypandassklea...

Don't have an account?
Coming from Hortonworks? Activate your account here