Created 09-19-2018 10:35 AM
I am running a PySpark job in Spark 2.3 cluster with the following command.
spark-submit --deploy-mode cluster --master yarn --files ETLConfig.json PySpark_ETL_Job_v0.2.py
ETLConfig.json has a parameter passed to the PySpark script and I am referring this config json file in the main block as below:-
configFilePath = os.path.join(SparkFiles.getRootDirectory(), 'ETLConfig.json')
with open(configFilePath, 'r') as configFile:
configDict = json.load(configFile)
But, the command throws the following error. No such file or directory: u'/tmp/spark-7dbe9acd-8b02-403a-987d-3accfc881a98/userFiles-4df4-5460-bd9c-4946-b289-6433-drgs/ETLConfig.json' May i know what's wrong with my script? I also tried with SparkFiles.get() command but it also didn't work.
Created 09-19-2018 04:10 PM
Hi @Senthilnathan Jegadeeswara,
I assume the file you're trying to submit is located in your home. Can you try to open the file with it's name relative to your home instead of using SparkFiles ?
For example: (if the file is located at the root of your home)
with open('ETLConfig.json') as configFile:
configDict = json.load(configFile)
Best regards,
Damien.
Created 09-21-2018 08:30 AM
Thanks for your reply. What you said is something i tried and it works but i wanted to know how --files argument works by sending the file across the cluster.
When I tried by referring json file from home directory, it works when i submitted the job in the cluster on yarn client mode. Can you please let me know how to use --files option to acheive the same thing?
Created 09-20-2018 02:15 PM
Hi @Damien Saillard,
Thanks for your reply. What you said is something i tried and it works but i wanted to know how --files argument works by sending the file across the cluster.
When I tried by referring json file from home directory, it works when i submitted the job in the cluster on yarn client mode. Can you please let me know how to use --files option to acheive the same thing?