- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
PySpark spark-submit error when using --files argument
- Labels:
-
Apache Spark
Created ‎09-19-2018 10:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am running a PySpark job in Spark 2.3 cluster with the following command.
spark-submit --deploy-mode cluster --master yarn --files ETLConfig.json PySpark_ETL_Job_v0.2.py
ETLConfig.json has a parameter passed to the PySpark script and I am referring this config json file in the main block as below:-
configFilePath = os.path.join(SparkFiles.getRootDirectory(), 'ETLConfig.json')
with open(configFilePath, 'r') as configFile:
configDict = json.load(configFile)
But, the command throws the following error. No such file or directory: u'/tmp/spark-7dbe9acd-8b02-403a-987d-3accfc881a98/userFiles-4df4-5460-bd9c-4946-b289-6433-drgs/ETLConfig.json' May i know what's wrong with my script? I also tried with SparkFiles.get() command but it also didn't work.
Created ‎09-19-2018 04:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Senthilnathan Jegadeeswara,
I assume the file you're trying to submit is located in your home. Can you try to open the file with it's name relative to your home instead of using SparkFiles ?
For example: (if the file is located at the root of your home)
with open('ETLConfig.json') as configFile:
configDict = json.load(configFile)
Best regards,
Damien.
Created ‎09-21-2018 08:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply. What you said is something i tried and it works but i wanted to know how --files argument works by sending the file across the cluster.
When I tried by referring json file from home directory, it works when i submitted the job in the cluster on yarn client mode. Can you please let me know how to use --files option to acheive the same thing?
Created ‎09-20-2018 02:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Damien Saillard,
Thanks for your reply. What you said is something i tried and it works but i wanted to know how --files argument works by sending the file across the cluster.
When I tried by referring json file from home directory, it works when i submitted the job in the cluster on yarn client mode. Can you please let me know how to use --files option to acheive the same thing?
