Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Combine MLlib Prediction and Features on Dstreams

avatar
Explorer
Hi,
 
I need help on Dstream operation. 
 
In fact, I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing.
 
I am predicting using below piece of code:
 
predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x : x.split(',')).map( lambda parts : [float(i) for i in parts] ).transform(lambda rdd: rf_model.predict(rdd))
 
Here texts is dstream having single line of text as records
getFeatures generates a comma separated features extracted from each record
 
 
I want the output as below tuple:
("predicted value", "original text")
 
How can I achieve that ? 
or 
at least can I perform .zip like normal RDD operation on two Dstreams, like below:
output = texts.zip(predictions)
 
 
Note: I posted the same problem on spark user mailing list.
 
Thanks,
Obaid

 

 

1 ACCEPTED SOLUTION

avatar
Explorer
1 REPLY 1

avatar
Explorer

Hi,

 

I have the solution.

Please check my post in stackoverflow:

http://stackoverflow.com/questions/37466361/how-to-combine-two-dstreams-using-pyspark-similar-to-zip...

 

Thanks,

Obaid