I don't know if this is the best place to post this question, but I'm new to the Hadoop world and this is probably a 101 level question. I have a file that I've ftp'd onto my master node and I'm using the spark-shell on the master node to test loading it into spark and eventually hdfs. When I use the sc.textFile("file:///.....) I get an error that the file doesn't exist on the data node 0, which I thought was wierd because I'm running the spark-shell on the master node where the local file does exist. If I ftp the file to all of my nodes I can load the file from the local directory, but I don't want to have to send the file to every node when I get a new file. Is there a way to specify which node is where the local file is? There's a file that will be uploaded to a shared folder on our on prem filesystem and I want to have a job here that looks in the directory and ftp's it to the cloudera vm if it exists. Then I can have a scheduled Spark job to process it and load it into hdfs. I thought about using flume to pull it and load it into hdfs, but there will probably only be a new file every month or so and I thought Flume would be overkill for this.