I have an HDP 2.5 installed on my cluster with an Operating System CentOS 6.5. The requirements established in Hortonworks official documentation tell that I need to use a Python 2.6 version to work, but with that version I cannot install the libraries numpy, scikit-learn and pandas.
I tried to change the Python version to Python 2.7 but then I got an error on the cluster when executing "yum" command.
I also tried installing Python2.7 version, and not changong system Python version to it, but specifying Python 2.7 is the version to use in the ZEPPELIN pyspark interpreter. I got an error saying the worker nodes are not using the same Python version.
So my question is: is there any way to work in Hortonworks ZEPPELIN with pandas, numpy and scikit-learn, with CentOS 6.5 cluster?
My typical recommendation is to use Anaconda Python (https://www.continuum.io/downloads) which provides those libraries and more out of the box. Anaconda, when installed, does not replace the standard CentOS Python which avoids the problems you are seeing.