Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

RDD.pipe() is resulting in "No such file or directory" only when running in Cloudera

Re: RDD.pipe() is resulting in "No such file or directory" only when running in Cloudera

Cloudera Employee

So, should he be passing in file:// in his arg?  Does it default to a local file?

Re: RDD.pipe() is resulting in "No such file or directory" only when running in Cloudera

Master Collaborator

Without a scheme, it should be treated as a local file, yes. It seems like the file is distributed just fine; there's a potential permissions or visibility issue somewhere though.

Re: RDD.pipe() is resulting in "No such file or directory" only when running in Cloudera

Cloudera Employee

Hi Kevin,

 

I'm curious, what are your imports for the sample code you provided?  I can find "Paths" in the documentation?

 

Scott

Highlighted

Re: RDD.pipe() is resulting in "No such file or directory" only when running in Cloudera

Master Collaborator
Paths is the standard Java 7 class for parsing file paths. This much looks fine.

Re: RDD.pipe() is resulting in "No such file or directory" only when running in Cloudera

Explorer

To remove R from the equation, I tried it with a simple bash script; same behavior (i.e. works with local[4], fails on yarn-client with either "No such file or directory" or "Permission denied", depending on the node).

 

I also tried adding the scheme "file://" to the beginning, again same behavior.

 

I'm with Scott, I think it looks like a permission issue, but I was not involved with setting up permissions so not certain.  We'll have Scott look at it offline, thanks everyone.

Re: RDD.pipe() is resulting in "No such file or directory" only when running in Cloudera

New Contributor

Hi, I'm wondering if you all determined the root cause and/or a solution to this. I'm having the same problem myself. Thanks!

Re: RDD.pipe() is resulting in "No such file or directory" only when running in Cloudera

Explorer

No, I never did.  Eventually we worked around the problem by just installing the script we wanted to invoke on every node in the cluster and adding it to the path using Ansible (you could probably also achieve this with Puppet or Chef).  This allowed us to achieve our goal, without having to rely on Spark distributing the script file.  But it is not ideal obviously, since now we have an extra deployment step to update the script on all the nodes whenever it changes.  I'm still not too happy, but I've moved on.

Re: RDD.pipe() is resulting in "No such file or directory" only when running in Cloudera

New Contributor
Kevin - thanks for the prompt reply. I'm thinking I'll probably just end up doing the same thing (manually shipping a copy to the worker nodes).