Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Add custom python library path to Pyspark code


In my hadoop cluster they installed anaconda package in some other path other than python default path. I am getting below error when i try to access numpy in pyspark

ImportError: No module named numpy

I am invoking pyspark using oozie.

I tried to give this custom python library path in below approaches

Using oozie tags

<property>       <name></name>       <value>PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7</value>     </property>

Using spark option tag

<spark-opts>spark.yarn.appMasterEnv.PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.python=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.driver.python=/var/opt/teradata/anaconda2/bin/python2.7</spark-opts>

Nothing works.

When i run plain python script it works fine. Problem is passing to pyspark

Even i gave this in pyspark header also as #! /usr/bin/env /var/opt/teradata/anaconda2/bin/python2.7

When i print sys.path in my pyspark code it still gives me below default path

[ '/usr/lib/', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/local/lib64/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages']

Kindly give me any solution


Master Guru

@selvaprabhu_k  The same issue has been answered in below thread, please have a look. 

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.