Created on 11-23-2017 08:06 AM - edited 08-17-2019 09:24 PM
Hello guys,
I still have this problem. I have installed pandas into system. After installation I have restart the zeppelin notebook.
But still It shows me the messae: no module named pandas. Someone gave me advice, that I should verify spark interpreter settings config "zeppelin.pyspark.python". I tried put there anaconda instead of python, but it didnt work. I tried put the location of anacoda /root/yes/bin/anaconda too but it didnt work. So what sould I put there? Does anybody know how to solve this problem? Or how could I import pandas module insto Hortonworks in the easiest way?
Created 11-23-2017 09:19 AM
I see two different versions of python in your screenshots. In zeppelin notebook error, I see Python 2.6 is deprecated. In your terminal I see python 3.6.3 which is part of Anaconda.
If you want to use python from anaconda ,export these env variable in Ambari (Advanced zeppelin-env ->zeppelin_env_content )
PYSPARK_DRIVER_PYTHON= <path to anaconda python>
In your interpreter, change zeppelin.pyspark.python = <path to anaconda python>
Thanks,
Aditya
Created 11-23-2017 06:23 PM
Hi enzo,
I may have the answer but I need a couple of details :
Kind regards,
Paul
Created on 11-24-2017 09:05 AM - edited 08-17-2019 09:24 PM
Hi @Aditya Sirna, I cant find PYSPARK_DRIVER_PYTHON in the Advanced zeppelin-env ->zeppelin_env_content .
Check the configuration in the attachment. So shoud I add a new line into this configuration file?
export PYSPARK_DRIVER_PYTHON= /root/yes/anaconda
and change zeppelin.pyspark.python = /root/yes/anaconda. Like this?
e this?
Created on 11-24-2017 09:40 AM - edited 08-17-2019 09:24 PM
Hi @enzo EL
If you can't find this property please add it
These are my settings for a Zeppelin stand-alone version 0.7.3 with HDP 2.5 and anaconda3 with Python 3.5
(I am using Spark 2.0.0 and the PySpark version does not work well with python 3.6)
export PYTHONPATH="/var/opt/teradata/anaconda3/envs/py35/bin:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip" export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}" export PYSPARK_DRIVER_PYTHON="/var/opt/teradata/anaconda3/envs/py35/bin/python" export PYSPARK_PYTHON="/var/opt/teradata/anaconda3/envs/py35/bin/python" export PYLIB="/var/opt/teradata/anaconda3/envs/py35/lib"
In your case:
PYTHONPATH = "path-to-your-python/bin:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip"
PYSPARK_DRIVER_PYTHON = "path-to-your-python/bin/python"
PYSPARK_PYTHON = "path-to-your-python/bin/python"
According to the documentation both variables above must use the same version of python in order to work properly "PySpark requires the same minor version of Python in both driver and workers"
I also added PYLIB to the configuration but I think it is not necesary.
In Zeppelin Interpreters Page I created a new Interpreter for Python:
This should be the result:
You also need to adjust the spark interpreter:
The last configuration, and this was extrem tricky, do not add python3 to your PATH env!
Instead create a symlink to conda
Example: ln -s /opt/anaconda3/bin/conda /bin/conda
Or to another location existing in your PATH like /user/lib or /var/lib etc.
Additionally you can test the installation as I did:
Python interpreter test
Pyspark and Pandas interaction: From dataFrame to Pandas DataFrame
I hope this helps you.
Kind regards,
Paul
Created on 11-24-2017 10:00 AM - edited 08-17-2019 09:23 PM
Hi, @Paul Hernandez, where could I find "path-to-your-python/bin/python". When type which anaconda it gave me back
/root/yes/bin/anaconda. So should I put this as a path? That means:
export PYTHONPATH="/root/yes/bin/anaconda/bin:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip" export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}" export PYSPARK_DRIVER_PYTHON="/root/yes/bin/anaconda."
export PYSPARK_PYTHON="/root/yes/bin/anaconda"
export PYLIB="/var/opt/teradata/anaconda3/envs/py35/lib"¨ - here I cant find this folder..
Thank you for helping me.
Created 11-24-2017 10:08 AM
Hi @enzo EL
In your case:
export PYTHONPATH="/root/yes/bin/anaconda/bin:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip"
export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}"
export PYSPARK_DRIVER_PYTHON="/root/yes/bin/anaconda/bin/python"
export PYSPARK_PYTHON="/root/yes/bin/anaconda/bin/python"
This should works.
BTW. Why have you installed anaconda in this location?
Kind regards, Paul
Created on 11-24-2017 10:25 AM - edited 08-17-2019 09:23 PM
The location was by default, I dont know. I have another question, I am trying to create python interpret, but I cant edit
interpreter group. I am following your guide. How could I create it?
Created 11-24-2017 10:32 AM
what is your Zeppelin version? It seems like an 0.6.x version
You may be able to use Pandas without configuring the python interpreter.
Created on 11-24-2017 11:14 AM - edited 08-17-2019 09:23 PM
My version is 0.7.0.2.6 (screenshot). I have changed the path, restart the notebook but it doesnt work..:/
What should I do, I need pandas to plotting..:/
Created 11-25-2017 07:19 AM
Hi @enzo EL
1) If you just need pandas with pyspark, just test it with the example I provided for the spark interpreter
2) It seems like the python interpreter is first available with Zeppelin 0.7.2. Is an upgrade possible for you?
3) You can add non available interpreters following the official documentation: https://zeppelin.apache.org/docs/0.7.0/manual/interpreterinstallation.html
I have never done it before but it should work.