Reply
Highlighted
Explorer
Posts: 14
Registered: ‎10-19-2015

pyspark.ml.Pipeline running in parallel

Hello friends,

 

I am new to spark ml pipelines , so wondering if you could give me some pointers for the following issue

 

I have in python customer Estimator and Model code.

 

 

pipeline = Pipeline(stages=[MyEsimator()])
pipelinemodel = pipeline.fit(train_df)
results = pipelinemodel.transform(test_df)

 

It works fine, but I have to train 50 models in parallel.  so, what is the best way to run  pipeline.fit() and transform()

in parallel ? Is there any support in spark ml for it ?