Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HELP with Pandas into zeppelin

Highlighted

HELP with Pandas into zeppelin

New Contributor

Hello guys,

I still have this problem. I have installed pandas into system. After installation I have restart the zeppelin notebook.

But still It shows me the messae: no module named pandas. Someone gave me advice, that I should verify spark interpreter settings config "zeppelin.pyspark.python". I tried put there anaconda instead of python, but it didnt work. I tried put the location of anacoda /root/yes/bin/anaconda too but it didnt work. So what sould I put there? Does anybody know how to solve this problem? Or how could I import pandas module insto Hortonworks in the easiest way?

43681-43634-screenshot-106.png

43680-42623-screenshot-4.png

10 REPLIES 10

Re: HELP with Pandas into zeppelin

@enzo EL,

I see two different versions of python in your screenshots. In zeppelin notebook error, I see Python 2.6 is deprecated. In your terminal I see python 3.6.3 which is part of Anaconda.

If you want to use python from anaconda ,export these env variable in Ambari (Advanced zeppelin-env ->zeppelin_env_content )

PYSPARK_DRIVER_PYTHON= <path to anaconda python>

In your interpreter, change zeppelin.pyspark.python = <path to anaconda python>

Thanks,

Aditya

Re: HELP with Pandas into zeppelin

New Contributor

Hi enzo,

I may have the answer but I need a couple of details :

  • Hortonworks version
  • Zeppelin version
  • Anaconda and python version
  • Is an ambari managed zeppelin or stand alone

Kind regards,

Paul

Re: HELP with Pandas into zeppelin

New Contributor

43702-screenshot-12.png

Hi @Aditya Sirna, I cant find PYSPARK_DRIVER_PYTHON in the Advanced zeppelin-env ->zeppelin_env_content .

Check the configuration in the attachment. So shoud I add a new line into this configuration file?

export PYSPARK_DRIVER_PYTHON= /root/yes/anaconda

and change zeppelin.pyspark.python = /root/yes/anaconda. Like this?

43700-screenshot-10.png

43701-screenshot-11.png

e this?

Re: HELP with Pandas into zeppelin

New Contributor

Hi @enzo EL

If you can't find this property please add it

These are my settings for a Zeppelin stand-alone version 0.7.3 with HDP 2.5 and anaconda3 with Python 3.5

(I am using Spark 2.0.0 and the PySpark version does not work well with python 3.6)

export PYTHONPATH="/var/opt/teradata/anaconda3/envs/py35/bin:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip"
export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}"
export PYSPARK_DRIVER_PYTHON="/var/opt/teradata/anaconda3/envs/py35/bin/python"
export PYSPARK_PYTHON="/var/opt/teradata/anaconda3/envs/py35/bin/python"
export PYLIB="/var/opt/teradata/anaconda3/envs/py35/lib"

In your case:

PYTHONPATH = "path-to-your-python/bin:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip"

PYSPARK_DRIVER_PYTHON = "path-to-your-python/bin/python"

PYSPARK_PYTHON = "path-to-your-python/bin/python"

According to the documentation both variables above must use the same version of python in order to work properly "PySpark requires the same minor version of Python in both driver and workers"

I also added PYLIB to the configuration but I think it is not necesary.

In Zeppelin Interpreters Page I created a new Interpreter for Python:

42745-create-python-interpreter.png

This should be the result:

42746-python-interpreter.png

You also need to adjust the spark interpreter:

42747-spark-int-python.jpg

The last configuration, and this was extrem tricky, do not add python3 to your PATH env!

Instead create a symlink to conda

Example:  ln -s /opt/anaconda3/bin/conda /bin/conda

Or to another location existing in your PATH like /user/lib or /var/lib etc.

Additionally you can test the installation as I did:

Python interpreter test

42748-python-interpreter-test.jpg

Pyspark and Pandas interaction: From dataFrame to Pandas DataFrame

42749-data-rame-to-pandas.jpg

I hope this helps you.

Kind regards,

Paul


Re: HELP with Pandas into zeppelin

New Contributor

43703-screenshot-16.png

Hi, @Paul Hernandez, where could I find "path-to-your-python/bin/python". When type which anaconda it gave me back

/root/yes/bin/anaconda. So should I put this as a path? That means:

export PYTHONPATH="/root/yes/bin/anaconda/bin:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip" export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}" export PYSPARK_DRIVER_PYTHON="/root/yes/bin/anaconda."

export PYSPARK_PYTHON="/root/yes/bin/anaconda"

export PYLIB="/var/opt/teradata/anaconda3/envs/py35/lib"¨ - here I cant find this folder..

Thank you for helping me.

Re: HELP with Pandas into zeppelin

New Contributor

Hi @enzo EL

In your case:

export PYTHONPATH="/root/yes/bin/anaconda/bin:/usr/hdp/current/spark-client/python/lib/py4j-0.8.2.1-src.zip" 
export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}"
export PYSPARK_DRIVER_PYTHON="/root/yes/bin/anaconda/bin/python"
export PYSPARK_PYTHON="/root/yes/bin/anaconda/bin/python"

This should works.

BTW. Why have you installed anaconda in this location?

Kind regards, Paul

Re: HELP with Pandas into zeppelin

New Contributor
@Paul Hernandez

The location was by default, I dont know. I have another question, I am trying to create python interpret, but I cant edit

interpreter group. I am following your guide. How could I create it?

43704-screenshot-17.jpg

Re: HELP with Pandas into zeppelin

New Contributor
@enzo EL

what is your Zeppelin version? It seems like an 0.6.x version

You may be able to use Pandas without configuring the python interpreter.

Re: HELP with Pandas into zeppelin

New Contributor

My version is 0.7.0.2.6 (screenshot). I have changed the path, restart the notebook but it doesnt work..:/

What should I do, I need pandas to plotting..:/

43705-screenshot-1.jpg