- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Add custom python library path to Pyspark code
- Labels:
-
Apache Oozie
Created 02-25-2019 03:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my hadoop cluster they installed anaconda package in some other path other than python default path. I am getting below error when i try to access numpy in pyspark
ImportError: No module named numpy
I am invoking pyspark using oozie.
I tried to give this custom python library path in below approaches
Using oozie tags
<property> <name>oozie.launcher.mapreduce.map.env</name> <value>PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7</value> </property>
Using spark option tag
<spark-opts>spark.yarn.appMasterEnv.PYSPARK_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.python=/var/opt/teradata/anaconda2/bin/python2.7 --conf spark.pyspark.driver.python=/var/opt/teradata/anaconda2/bin/python2.7</spark-opts>
Nothing works.
When i run plain python script it works fine. Problem is passing to pyspark
Even i gave this in pyspark header also as #! /usr/bin/env /var/opt/teradata/anaconda2/bin/python2.7
When i print sys.path in my pyspark code it still gives me below default path
[ '/usr/lib/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/local/lib64/python2.7/site-packages', '/usr/local/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages']
Kindly give me any solution
Created 08-01-2020 09:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@selvaprabhu_k The same issue has been answered in below thread, please have a look.
- https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90427#M216...
- https://community.cloudera.com/t5/Support-Questions/Jupyter-notebook-gt-ImportError-No-module-named-...
Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
