Created on 04-20-2022 09:18 AM - edited 04-20-2022 02:51 PM
Created 04-20-2022 03:55 PM
Hi @Data1701
According to the API documentation, one can get a java.net.URISyntaxException when a passed string could not be parsed as a URI reference.
The file you are attempting to read in might very well be available on your local area network from a shared server drive, but it isn't available via a valid URI, or at the very least, the URI you are referencing in your Spark code isn't a valid and accessible URI.
What your problem boils down to is that the file isn't available via a web server, and the server that is running your Spark code can't retrieve it at the time your code executes. And that should shed light on why you had to previously upload your .csv files into CDSW, because that was the way to ensure that they could be found at runtime, since they were in a well-known/accessible location.
There are several valid approaches to addressing this, but the easiest solution, if you want to continue to use the code snippet you've written and shared here, is to place the file on some server that is accessible over the web (preferably via HTTPS) and refer to it using a fully-qualified URL. In order to do that, a functioning and secured web server will have to be available to you (you could set this up on your local workstation).
Let's assume you place the file on a web-accessible server somewhere local to your corporate network and the web-accessible directory path you place the file in turns out to be something like Data1701/project/data_folder/. Then you can change the assignment statement in your Spark code to this:
df = spark.read.format('csv').load('https://web.dept.yourcompany.com/Data1701/project/data_folder/file.csv', header=True)
…and the rest of your code should work, unchanged.
Created 04-20-2022 03:55 PM
Hi @Data1701
According to the API documentation, one can get a java.net.URISyntaxException when a passed string could not be parsed as a URI reference.
The file you are attempting to read in might very well be available on your local area network from a shared server drive, but it isn't available via a valid URI, or at the very least, the URI you are referencing in your Spark code isn't a valid and accessible URI.
What your problem boils down to is that the file isn't available via a web server, and the server that is running your Spark code can't retrieve it at the time your code executes. And that should shed light on why you had to previously upload your .csv files into CDSW, because that was the way to ensure that they could be found at runtime, since they were in a well-known/accessible location.
There are several valid approaches to addressing this, but the easiest solution, if you want to continue to use the code snippet you've written and shared here, is to place the file on some server that is accessible over the web (preferably via HTTPS) and refer to it using a fully-qualified URL. In order to do that, a functioning and secured web server will have to be available to you (you could set this up on your local workstation).
Let's assume you place the file on a web-accessible server somewhere local to your corporate network and the web-accessible directory path you place the file in turns out to be something like Data1701/project/data_folder/. Then you can change the assignment statement in your Spark code to this:
df = spark.read.format('csv').load('https://web.dept.yourcompany.com/Data1701/project/data_folder/file.csv', header=True)
…and the rest of your code should work, unchanged.