About pranav_singhani

pranav_singhani · ‎02-14-2018

@dhieru singh As a command from the server it does work, it's from the spark-submit that it fails to do so.

pranav_singhani · ‎02-14-2018

I have been trying to copy some HDFS files over to the local file system using the FileSystem API's copyToLocalFile function. But when I run the spark job in the cluster mode, it is unable to write to my local file system, even with the absolute path. It usually fails with a permission denied error, though I'm running the spark-submit from the same user and the user would obviously have access to his/her home folder. I've checked the user by using Process("whoami") and it's my user (and not yarn or any other user who probably did not have access to my home folder) I've tried to explicitly set the working directory to the given folder but to no avail. It says it cannot create the directory though the directory already exists on the local file system. On checking the working directory from Process("pwd") it points to a usercache location specific to the application. I've alternatively tried to even give the absolute path with the mount point, but to no avail. I've also tried to execute the Process("hdfs dfs -copyToLocal <src> <dest>") as a workaround and that hasn't worked either. And I have no issues when I run the same commands via spark-shell or spark-submit in the client mode. YARN seems to be somehow affecting the process, am I missing something? What are the alternative to get it onto the local file system via code?

pranav_singhani · ‎01-19-2018

@Sebastien Chausson This is precisely the kind of issue I am facing too. The initial tasks finish within seconds and the rest of them take a massive amount of time, ranging to hours and eventually failing. Any improvements or discoveries?

Online	Offline
Last Visited	‎02-14-2018 02:01 PM

Member Since	‎01-19-2018 11:14 AM
Last Visited	‎02-14-2018 02:01 PM
Posts	3

Cloudera Community

Re: copyToLocalFile fails on spark-submit in clust...

copyToLocalFile fails on spark-submit in cluster m...

Re: Spark "long running" tasks low performance, ho...