Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

pyspark.ml.Pipeline running in parallel

Highlighted

pyspark.ml.Pipeline running in parallel

Hello friends,

 

I am new to spark ml pipelines , so wondering if you could give me some pointers for the following issue

 

I have in python customer Estimator and Model code.

 

 

pipeline = Pipeline(stages=[MyEsimator()])
pipelinemodel = pipeline.fit(train_df)
results = pipelinemodel.transform(test_df)

 

It works fine, but I have to train 50 models in parallel.  so, what is the best way to run  pipeline.fit() and transform()

in parallel ? Is there any support in spark ml for it ?