I have an HDP 2.5 installed on my cluster with an Operating System CentOS 6.5. The requirements established in Hortonworks official documentation tell that I need to use a Python 2.6 version to work, but with that version I cannot install the libraries numpy, scikit-learn and pandas.
I tried to change the Python version to Python 2.7 but then I got an error on the cluster when executing "yum" command.
I also tried installing Python2.7 version, and not changong system Python version to it, but specifying Python 2.7 is the version to use in the ZEPPELIN pyspark interpreter. I got an error saying the worker nodes are not using the same Python version.
So my question is: is there any way to work in Hortonworks ZEPPELIN with pandas, numpy and scikit-learn, with CentOS 6.5 cluster?
Thank you so much!
My typical recommendation is to use Anaconda Python (https://www.continuum.io/downloads) which provides those libraries and more out of the box. Anaconda, when installed, does not replace the standard CentOS Python which avoids the problems you are seeing.
See my answer here on how to get Zeppelin and Spark to use Anaconda: https://community.hortonworks.com/questions/53793/zeppelin-pyspark-adding-libraries-numpypandassklea...