Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎10-16-2017
Accepted Solution

Local Data combined with HDFS

Hi All,

 

Just looking through the CDSW documentation and have found the following : If you want to create a new project around one or more data files on your computer, select the Localoption when creating the project.

 

We're looking at creating projects that combine Local data, data from HDFS, etc. Is this possible? Or can you only use local files in a project that's marked as 'Local'?

 

Thanks a lot.

Cloudera Employee
Posts: 461
Registered: ‎08-11-2014

Re: Local Data combined with HDFS

Really, this is just saying you can upload data at project creation time or later from your local computer to the local file system that the Python/R/Scala sessions see in their local file system. 


Those jobs then see those local files as simple files, and can do what they like with them.


But you can also within the same program access whatever data you want, anywhere you want; you just need to write code that does so. Via Spark or whatever library you want you can also access whatever data sources you want, as well.


There is no either/or here.

New Contributor
Posts: 2
Registered: ‎10-16-2017

Re: Local Data combined with HDFS

Hi, thanks for the information! This is exactly what I expected but just wanted to make sure.

Announcements