Support Questions
Find answers, ask questions, and share your expertise

Add custom python library path to Pyspark code

In my hadoop cluster they installed anaconda package in some other path other than python default path. I am getting below error when i try to access numpy in pyspark

ImportError: No module named numpy

I am invoking pyspark using oozie.

I tried to give this custom python library path in below approaches

Using oozie tags

<property>       <name>oozie.launcher.mapreduce.map.env</name>       <value>PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7</value>     </property>

Using spark option tag

<spark-opts>spark.yarn.appMasterEnv.PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.python=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.driver.python=/var/opt/teradata/anaconda2/bin/python2.7</spark-opts>


Nothing works.

When i run plain python script it works fine. Problem is passing to pyspark

Even i gave this in pyspark header also as #! /usr/bin/env /var/opt/teradata/anaconda2/bin/python2.7

When i print sys.path in my pyspark code it still gives me below default path

[ '/usr/lib/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/local/lib64/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages']

Kindly give me any solution

1 REPLY 1

@selvaprabhu_k  The same issue has been answered in below thread, please have a look. 


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.