Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Need some clarifications on PipelineModel.save .

Highlighted

Need some clarifications on PipelineModel.save .

Contributor

Hi All ,

As per my understanding says PipelineModel.save saves the model on the file system based on the URI ,before storing it collects the individual result from the worker and stores on the file system provided .

When I am running the spark on standalone mode with 2 workers and the path to save the file is local linux FS say (/tmp/examplemodel/) it stores the model on worker nodes as well . My question here is the data which is getting stored on the FS of workers is intermediate data and on the driver is the complete model data ,then why does spark is not cleaning up the data on the workers when it is of no use as it intermediate data .

And

When I am using the HDFS then where does the intermediate data of the workers will be stored ? As it final stores in one place .

TIA,

Param.