Support Questions

Find answers, ask questions, and share your expertise

Spark Stanadlone mode not storing the trained model properly.


Dear All,

I am using the spark to train model and save it on file system as using .

This works fine when I am running the spark in local mode but when I am running the spark in standalone mode where 2 workers are there , it is training the model correctly but when saving on the file it is storing the partial result .And the file system I am using is Local one i.e file:// .

Could someone please point out what is the issue here.

And is it an issue to store the trained PipelineModel on local file system.

FYI ..

Same code works when I use HDFS file system.

Spark version I am using is : 2.0.0

Thanks in Advance ,



My guess here is that each worker writes the model to file://model-path/model-part on each of the two worker machines. So maybe there is a part of the model on both machines?

With HDFS the model-path is the same and hence the model will be completely saved. So for a distributed system to store and load data all workers need to be able to access the same data under the same path. That's why a distributed file system is the usual recommendation. Ignoring performance, replication and so on you should also be able to mount and and use the same path on a network file system (SAN, NAS, NFS, ...), however this is not recommended


Thank you for the response ,

I agree with you completely , But My question even the partial result it not getting stored correctly .