Hello, I have a scenario where I need to copy the file from HDFS to local file system on edge server.
When i run the code in cluster mode with spark submit , I am getting below kind of error:
User class threw exception: java.io.IOException: Mkdirs failed to create
/some_specific/path/in_edge_server (exists=false, cwd=file:/some/path/yarn/nm/usercache/1234/appcache/application_123456744/container_t1234)
It seems like it is trying to find the destination path where the execution takes place and since it is not finding the specific path it is trying to create a directory and failing?
Below is the function being used:
fs.copyToLocalFile(new Path(src_HDFSPath), new Path(dest_edgePath))
Do we have a solution to copy the file from HDFS to specific path of edge server when we run the program in cluster mode?
Created 04-07-2023 01:14 AM
@Sanchari
It could be good to share a snippet of your code.
logically I think you copy FROM -->TO
Below is the function being used:
fs.copyFromLocalFile(new Path(src_HDFSPath), new Path(dest_edgePath))
Disclaimer I am not a Spark/Python developer
Created 04-07-2023 04:59 AM
@Shelton Below is the hadoop fs function being used
copyToLocalFile(new Path(src_HDFSPath), new Path(dest_edgePath))
Please note that my goal is to copy the file from HDFS to edge server local file system when i run the spark job in cluster mode
Created 04-07-2023 12:47 PM
@Sanchari
I suspect the java.io.IOException: Mkdirs failed to create is due to permissions on the edge-node
Assuming you are the HDFS copy is being run as hdfs and your edge node directory belongs to a different user/group that.
Just for test purposes can you do the following on the edgenode
Then run chmod on the destination path
Finally, rerun you spark-submit and let me know
Created 04-09-2023 10:00 PM
@Shelton Please note that the directory on edge server to which I am trying to copy the file is already present. So ideally it should not try to perform mkdir operation. as I mentioned in my first post, it is looking for the directory in cwd of the node where the code is being executed and since it is not able to find it , it is trying to create one. So basically it should look for the directory in edge server instead of the directory mentioned in the cwd of the error posted.