Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to read NTFS windows shared drive path files in spark scala code.

avatar
I have a scenario where I need to read files from windows shared path using spark and scala. I tried with below but could not able to find the files:object ExternalFiles {
  def main(args: Array[String]){
    val conf = new SparkConf().setMaster("local").setAppName("External Files")
    val sc = new SparkContext(conf)
    val files = sc.textFile("\\\\sharedNetwork\\External Data\\testData.txt")
    files.foreach(println)
  }
}

I tried using sc.textFile("file://sharedNetwork/External Data/testData.txt") but it shows below error in both cases:

18/12/23 11:57:57 WARN : Your hostname, name-21 resolves to a loopback/non-reachable address: 10.xx.xx.xxx, but we couldn't find any external IP address!
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file://sharedNetwork/External Data/testData.txt
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)

Can someone suggest me to resolve this solution to read files from shared drive using Spark and Scala code.

Pls: suggest me on how to download files from NTFS windows shared path to linux machine through putty.

Thanks,

Chaitanya

1 REPLY 1

avatar
Super Collaborator

To upload the file from your Windows machine to a Linux machine, you can use a tool like WinSCP. You configure the session for the Linux machine almost identical to the config in Putty. It gives you a GUI to copy files.

On the other hand, when you need to access the Windows machine from Linux, you need to configure an FTP or better SFTP server on Windows that allows access to your NTFS path. Or you use the Windows Network to share, and install Samba, a Windows networking implementation, on the Linux machine.