Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Combine MLlib Prediction and Features on Dstreams

Explorer
Hi,
 
I need help on Dstream operation. 
 
In fact, I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing.
 
I am predicting using below piece of code:
 
predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x : x.split(',')).map( lambda parts : [float(i) for i in parts] ).transform(lambda rdd: rf_model.predict(rdd))
 
Here texts is dstream having single line of text as records
getFeatures generates a comma separated features extracted from each record
 
 
I want the output as below tuple:
("predicted value", "original text")
 
How can I achieve that ? 
or 
at least can I perform .zip like normal RDD operation on two Dstreams, like below:
output = texts.zip(predictions)
 
 
Note: I posted the same problem on spark user mailing list.
 
Thanks,
Obaid

 

 

1 ACCEPTED SOLUTION

Explorer
1 REPLY 1

Explorer

Hi,

 

I have the solution.

Please check my post in stackoverflow:

http://stackoverflow.com/questions/37466361/how-to-combine-two-dstreams-using-pyspark-similar-to-zip...

 

Thanks,

Obaid

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.