Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Combine MLlib Prediction and Features on Dstreams

Solved Go to solution
Highlighted

Combine MLlib Prediction and Features on Dstreams

New Contributor
Hi,
 
I need help on Dstream operation. 
 
In fact, I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing.
 
I am predicting using below piece of code:
 
predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x : x.split(',')).map( lambda parts : [float(i) for i in parts] ).transform(lambda rdd: rf_model.predict(rdd))
 
Here texts is dstream having single line of text as records
getFeatures generates a comma separated features extracted from each record
 
 
I want the output as below tuple:
("predicted value", "original text")
 
How can I achieve that ? 
or 
at least can I perform .zip like normal RDD operation on two Dstreams, like below:
output = texts.zip(predictions)
 
 
Note: I posted the same problem on spark user mailing list.
 
Thanks,
Obaid

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Combine MLlib Prediction and Features on Dstreams

New Contributor

Hi,

 

I have the solution.

Please check my post in stackoverflow:

http://stackoverflow.com/questions/37466361/how-to-combine-two-dstreams-using-pyspark-similar-to-zip...

 

Thanks,

Obaid

1 REPLY 1

Re: Combine MLlib Prediction and Features on Dstreams

New Contributor

Hi,

 

I have the solution.

Please check my post in stackoverflow:

http://stackoverflow.com/questions/37466361/how-to-combine-two-dstreams-using-pyspark-similar-to-zip...

 

Thanks,

Obaid