Created on 10-16-2017 07:51 AM - edited 09-16-2022 05:24 AM
Hi All,
Just looking through the CDSW documentation and have found the following : If you want to create a new project around one or more data files on your computer, select the Localoption when creating the project.
We're looking at creating projects that combine Local data, data from HDFS, etc. Is this possible? Or can you only use local files in a project that's marked as 'Local'?
Thanks a lot.
Created 10-16-2017 07:55 AM
Really, this is just saying you can upload data at project creation time or later from your local computer to the local file system that the Python/R/Scala sessions see in their local file system.
Those jobs then see those local files as simple files, and can do what they like with them.
But you can also within the same program access whatever data you want, anywhere you want; you just need to write code that does so. Via Spark or whatever library you want you can also access whatever data sources you want, as well.
There is no either/or here.
Created 10-16-2017 07:55 AM
Really, this is just saying you can upload data at project creation time or later from your local computer to the local file system that the Python/R/Scala sessions see in their local file system.
Those jobs then see those local files as simple files, and can do what they like with them.
But you can also within the same program access whatever data you want, anywhere you want; you just need to write code that does so. Via Spark or whatever library you want you can also access whatever data sources you want, as well.
There is no either/or here.
Created 10-17-2017 01:13 AM
Hi, thanks for the information! This is exactly what I expected but just wanted to make sure.