I have been trying to copy some HDFS files over to the local file system using the FileSystem API's copyToLocalFile function. But when I run the spark job in the cluster mode, it is unable to write to my local file system, even with the absolute path. It usually fails with a permission denied error, though I'm running the spark-submit from the same user and the user would obviously have access to his/her home folder. I've checked the user by using Process("whoami") and it's my user (and not yarn or any other user who probably did not have access to my home folder) I've tried to explicitly set the working directory to the given folder but to no avail. It says it cannot create the directory though the directory already exists on the local file system. On checking the working directory from Process("pwd") it points to a usercache location specific to the application. I've alternatively tried to even give the absolute path with the mount point, but to no avail. I've also tried to execute the Process("hdfs dfs -copyToLocal <src> <dest>") as a workaround and that hasn't worked either. And I have no issues when I run the same commands via spark-shell or spark-submit in the client mode. YARN seems to be somehow affecting the process, am I missing something? What are the alternative to get it onto the local file system via code?
... View more
@Sebastien Chausson This is precisely the kind of issue I am facing too. The initial tasks finish within seconds and the rest of them take a massive amount of time, ranging to hours and eventually failing. Any improvements or discoveries?
... View more