- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark-submit suddenly doesn't recognize modules on the cluster
- Labels:
-
Apache Spark
-
Apache YARN
Created on 09-30-2022 12:46 AM - edited 09-30-2022 02:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We're experiencing a peculiar issue these past few days - our python scripts started failing when run with spark-submit. On first glance our logs show that the scripts are encountering syntax errors whenever we are using code related to modules, but further troubleshooting showed that in actuality the modules are the issue. When we comment out pieces of code that throw syntax errors, we instead receive import errors ("No module named...").
We have two versions of python on our cluster but it appears that spark-submit still is using the proper python version with all our modules installed on it. Our scripts run just fine through pyspark, for some reason however, spark-submit does not recognize the imported modules when we run scripts through it.
What is more, YARN doesn't seem to recognize these jobs as failed, they are not logged at all, probably because they crash as soon as we start importing modules. So basically we do not have access to YARN logs for these jobs.
Any insight would be greatly appreciated.
Thanks.
Created on 09-30-2022 02:01 AM - edited 09-30-2022 02:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @imule
Add the following parameter to your spark-submit
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=<python3_path>
--conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=<python3_path>
Note:
1. Ensure python3_path exists in all nodes.
2. Ensure required modules are installed in each node.
Created on 09-30-2022 02:01 AM - edited 09-30-2022 02:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @imule
Add the following parameter to your spark-submit
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=<python3_path>
--conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=<python3_path>
Note:
1. Ensure python3_path exists in all nodes.
2. Ensure required modules are installed in each node.
Created 10-02-2022 10:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Firstly I would like to ask if there were any changes in the cluster i.e Patching or rpm ?
if the spark-submit was running successfully before you need to know this could be linked to the python version
On the edge/gateway node
# python -V
# Python 3.7.7
# conda deactivate
# Python 2.7.5
Then try relaunching your spark-submit
Created 10-07-2022 01:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@imule, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
