Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

copyToLocalFile fails on spark-submit in cluster mode

avatar
New Contributor

I have been trying to copy some HDFS files over to the local file system using the FileSystem API's copyToLocalFile function. But when I run the spark job in the cluster mode, it is unable to write to my local file system, even with the absolute path. It usually fails with a permission denied error, though I'm running the spark-submit from the same user and the user would obviously have access to his/her home folder. I've checked the user by using Process("whoami") and it's my user (and not yarn or any other user who probably did not have access to my home folder)

I've tried to explicitly set the working directory to the given folder but to no avail. It says it cannot create the directory though the directory already exists on the local file system. On checking the working directory from Process("pwd") it points to a usercache location specific to the application.

I've alternatively tried to even give the absolute path with the mount point, but to no avail. I've also tried to execute the Process("hdfs dfs -copyToLocal <src> <dest>") as a workaround and that hasn't worked either.

And I have no issues when I run the same commands via spark-shell or spark-submit in the client mode. YARN seems to be somehow affecting the process, am I missing something? What are the alternative to get it onto the local file system via code?

4 REPLIES 4

avatar
Expert Contributor

@Pranav Singhania have you tried hdfs dfs -get <hdfs path> <localpath>?

avatar
New Contributor

@dhieru singh As a command from the server it does work, it's from the spark-submit that it fails to do so.

avatar
New Contributor

Was there any answer for this?

I will do ssh to any edge node and then run a shell script to copy files from hdfs to local , not sure if there is any other right way of doing it?

avatar
Community Manager

Hi @prem1301 as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: