As per my understanding says PipelineModel.save saves the model on the file system based on the URI ,before storing it collects the individual result from the worker and stores on the file system provided .
When I am running the spark on standalone mode with 2 workers and the path to save the file is local linux FS say (/tmp/examplemodel/) it stores the model on worker nodes as well . My question here is the data which is getting stored on the FS of workers is intermediate data and on the driver is the complete model data ,then why does spark is not cleaning up the data on the workers when it is of no use as it intermediate data .
When I am using the HDFS then where does the intermediate data of the workers will be stored ? As it final stores in one place .