Member since
09-18-2018
3
Posts
0
Kudos Received
0
Solutions
09-21-2018
08:30 AM
Thanks for your reply. What you said is something i tried and it works but i wanted to know how --files argument works by sending the file across the cluster. When I tried by referring json file from home directory, it works when i submitted the job in the cluster on yarn client mode. Can you please let me know how to use --files option to acheive the same thing?
... View more
09-20-2018
02:15 PM
Hi @Damien Saillard, Thanks for your reply. What you said is something i tried and it works but i wanted to know how --files argument works by sending the file across the cluster. When I tried by referring json file from home directory, it works when i submitted the job in the cluster on yarn client mode. Can you please let me know how to use --files option to acheive the same thing?
... View more
09-19-2018
10:35 AM
I am running a PySpark job in Spark 2.3 cluster with the following command. spark-submit
--deploy-mode cluster
--master yarn
--files ETLConfig.json
PySpark_ETL_Job_v0.2.py ETLConfig.json has a parameter passed to the PySpark script and I am referring this config json file in the main block as below:- configFilePath = os.path.join(SparkFiles.getRootDirectory(), 'ETLConfig.json') with open(configFilePath, 'r') as configFile: configDict = json.load(configFile) But, the command throws the following error.
No such file or directory: u'/tmp/spark-7dbe9acd-8b02-403a-987d-3accfc881a98/userFiles-4df4-5460-bd9c-4946-b289-6433-drgs/ETLConfig.json'
May i know what's wrong with my script? I also tried with SparkFiles.get() command but it also didn't work.
... View more
Labels:
- Labels:
-
Apache Spark