Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Loading a local file into Spark


Loading a local file into Spark


I don't know if this is the best place to post this question, but I'm new to the Hadoop world and this is probably a 101 level question.  I have a file that I've ftp'd onto my master node and I'm using the spark-shell on the master node to test loading it into spark and eventually hdfs.  When I use the sc.textFile("file:///.....) I get an error that the file doesn't exist on the data node 0, which I thought was wierd because I'm running the spark-shell on the master node where the local file does exist.  If I ftp the file to all of my nodes I can load the file from the local directory, but I don't want to have to send the file to every node when I get a new file.  Is there a way to specify which node is where the local file is?  There's a file that will be uploaded to a shared folder on our on prem filesystem and I want to have a job here that looks in the directory and ftp's it to the cloudera vm if it exists.  Then I can have a scheduled Spark job to process it and load it into hdfs.  I thought about using flume to pull it and load it into hdfs, but there will probably only be a new file every month or so and I thought Flume would be overkill for this.

Don't have an account?
Coming from Hortonworks? Activate your account here