Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

PySpark spark-submit error when using --files argument

avatar
New Contributor

I am running a PySpark job in Spark 2.3 cluster with the following command.

spark-submit --deploy-mode cluster --master yarn --files ETLConfig.json PySpark_ETL_Job_v0.2.py

ETLConfig.json has a parameter passed to the PySpark script and I am referring this config json file in the main block as below:-

configFilePath = os.path.join(SparkFiles.getRootDirectory(), 'ETLConfig.json')

with open(configFilePath, 'r') as configFile:

configDict = json.load(configFile)

But, the command throws the following error. No such file or directory: u'/tmp/spark-7dbe9acd-8b02-403a-987d-3accfc881a98/userFiles-4df4-5460-bd9c-4946-b289-6433-drgs/ETLConfig.json' May i know what's wrong with my script? I also tried with SparkFiles.get() command but it also didn't work.

3 REPLIES 3

avatar
Cloudera Employee

Hi @Senthilnathan Jegadeeswara,

I assume the file you're trying to submit is located in your home. Can you try to open the file with it's name relative to your home instead of using SparkFiles ?

For example: (if the file is located at the root of your home)

with open('ETLConfig.json') as configFile:

configDict = json.load(configFile)

Best regards,

Damien.

avatar
New Contributor

Thanks for your reply. What you said is something i tried and it works but i wanted to know how --files argument works by sending the file across the cluster.

When I tried by referring json file from home directory, it works when i submitted the job in the cluster on yarn client mode. Can you please let me know how to use --files option to acheive the same thing?

avatar
New Contributor

Hi @Damien Saillard,

Thanks for your reply. What you said is something i tried and it works but i wanted to know how --files argument works by sending the file across the cluster.

When I tried by referring json file from home directory, it works when i submitted the job in the cluster on yarn client mode. Can you please let me know how to use --files option to acheive the same thing?