Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Local Data combined with HDFS

avatar
New Contributor

Hi All,

 

Just looking through the CDSW documentation and have found the following : If you want to create a new project around one or more data files on your computer, select the Localoption when creating the project.

 

We're looking at creating projects that combine Local data, data from HDFS, etc. Is this possible? Or can you only use local files in a project that's marked as 'Local'?

 

Thanks a lot.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Really, this is just saying you can upload data at project creation time or later from your local computer to the local file system that the Python/R/Scala sessions see in their local file system. 


Those jobs then see those local files as simple files, and can do what they like with them.


But you can also within the same program access whatever data you want, anywhere you want; you just need to write code that does so. Via Spark or whatever library you want you can also access whatever data sources you want, as well.


There is no either/or here.

View solution in original post

2 REPLIES 2

avatar
Master Collaborator

Really, this is just saying you can upload data at project creation time or later from your local computer to the local file system that the Python/R/Scala sessions see in their local file system. 


Those jobs then see those local files as simple files, and can do what they like with them.


But you can also within the same program access whatever data you want, anywhere you want; you just need to write code that does so. Via Spark or whatever library you want you can also access whatever data sources you want, as well.


There is no either/or here.

avatar
New Contributor

Hi, thanks for the information! This is exactly what I expected but just wanted to make sure.