Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Solved
Go to solution
Combine MLlib Prediction and Features on Dstreams
Labels:
- Labels:
-
Apache Spark
Explorer
Created on ‎05-27-2016 05:19 AM - edited ‎09-16-2022 03:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I need help on Dstream operation.
In fact, I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing.
I am predicting using below piece of code:
predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x : x.split(',')).map( lambda parts : [float(i) for i in parts] ).transform(lambda rdd: rf_model.predict(rdd))
Here texts is dstream having single line of text as records
getFeatures generates a comma separated features extracted from each record
I want the output as below tuple:
("predicted value", "original text")
How can I achieve that ?
or
at least can I perform .zip like normal RDD operation on two Dstreams, like below:
output = texts.zip(predictions)
Note: I posted the same problem on spark user mailing list.
Thanks,
Obaid
1 ACCEPTED SOLUTION
Explorer
Created ‎05-30-2016 10:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have the solution.
Please check my post in stackoverflow:
Thanks,
Obaid
1 REPLY 1
Explorer
Created ‎05-30-2016 10:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have the solution.
Please check my post in stackoverflow:
Thanks,
Obaid
