Combine MLlib Prediction and Features on Dstreams

Obaidul — Fri, 16 Sep 2022 10:22:04 GMT

Hi,

I need help on Dstream operation.

In fact, I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing.

I am predicting using below piece of code:

predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x : x.split(',')).map( lambda parts : [float(i) for i in parts] ).transform(lambda rdd: rf_model.predict(rdd))

Here texts is dstream having single line of text as records

getFeatures generates a comma separated features extracted from each record

I want the output as below tuple:

("predicted value", "original text")

How can I achieve that ?

at least can I perform .zip like normal RDD operation on two Dstreams, like below:

output = texts.zip(predictions)

Note: I posted the same problem on spark user mailing list.

Thanks,

Obaid

Re: Combine MLlib Prediction and Features on Dstreams

Obaidul — Tue, 31 May 2016 05:39:57 GMT

Hi,

I have the solution.

Please check my post in stackoverflow:

http://stackoverflow.com/questions/37466361/how-to-combine-two-dstreams-using-pyspark-similar-to-zip-on-normal-rdd/37537555#37537555

Thanks,

Obaid

question Combine MLlib Prediction and Features on Dstreams in Archives of Support Questions (Read Only)

Combine MLlib Prediction and Features on Dstreams

Re: Combine MLlib Prediction and Features on Dstreams