Support Questions

Find answers, ask questions, and share your expertise

Combine MLlib Prediction and Features on Dstreams

avatar
Explorer
Hi,
 
I need help on Dstream operation. 
 
In fact, I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing.
 
I am predicting using below piece of code:
 
predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x : x.split(',')).map( lambda parts : [float(i) for i in parts] ).transform(lambda rdd: rf_model.predict(rdd))
 
Here texts is dstream having single line of text as records
getFeatures generates a comma separated features extracted from each record
 
 
I want the output as below tuple:
("predicted value", "original text")
 
How can I achieve that ? 
or 
at least can I perform .zip like normal RDD operation on two Dstreams, like below:
output = texts.zip(predictions)
 
 
Note: I posted the same problem on spark user mailing list.
 
Thanks,
Obaid

 

 

1 ACCEPTED SOLUTION

avatar
Explorer
1 REPLY 1

avatar
Explorer

Hi,

 

I have the solution.

Please check my post in stackoverflow:

http://stackoverflow.com/questions/37466361/how-to-combine-two-dstreams-using-pyspark-similar-to-zip...

 

Thanks,

Obaid