Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Add custom python library path to Pyspark code

Highlighted

Add custom python library path to Pyspark code

New Contributor

In my hadoop cluster they installed anaconda package in some other path other than python default path. I am getting below error when i try to access numpy in pyspark

ImportError: No module named numpy

I am invoking pyspark using oozie.

I tried to give this custom python library path in below approaches

Using oozie tags

<property>       <name>oozie.launcher.mapreduce.map.env</name>       <value>PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7</value>     </property>

Using spark option tag

<spark-opts>spark.yarn.appMasterEnv.PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.python=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.driver.python=/var/opt/teradata/anaconda2/bin/python2.7</spark-opts>


Nothing works.

When i run plain python script it works fine. Problem is passing to pyspark

Even i gave this in pyspark header also as #! /usr/bin/env /var/opt/teradata/anaconda2/bin/python2.7

When i print sys.path in my pyspark code it still gives me below default path

[ '/usr/lib/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/local/lib64/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages']

Kindly give me any solution