- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to Use Spark MLLib Model in Storm?
- Labels:
-
Apache Spark
-
Apache Storm
Created ‎03-22-2016 03:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way to train the model offline in Spark MLLib, and then use it for online ML in Storm?
Created ‎03-22-2016 04:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use PMML (https://de.wikipedia.org/wiki/Predictive_Model_Markup_Language).
Spark does support (not all) model to be exported to PMML:
http://spark.apache.org/docs/latest/mllib-pmml-model-export.html
(UPDATE: As @Simon Elliston Ball rightfully points out in his answer, in case the PMML model is not supported the Spark libs can be reused as most of them have no dependency to the SparkContext)
One way could be to use JPMML with Java in Storm:
http://henning.kropponline.de/2015/09/06/jpmml-example-random-forest/
https://github.com/jpmml/jpmml-storm
The other could be to use R in Storm. I have seen it done, but don't have a reference at hand.
Created ‎03-22-2016 04:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use PMML (https://de.wikipedia.org/wiki/Predictive_Model_Markup_Language).
Spark does support (not all) model to be exported to PMML:
http://spark.apache.org/docs/latest/mllib-pmml-model-export.html
(UPDATE: As @Simon Elliston Ball rightfully points out in his answer, in case the PMML model is not supported the Spark libs can be reused as most of them have no dependency to the SparkContext)
One way could be to use JPMML with Java in Storm:
http://henning.kropponline.de/2015/09/06/jpmml-example-random-forest/
https://github.com/jpmml/jpmml-storm
The other could be to use R in Storm. I have seen it done, but don't have a reference at hand.
Created ‎03-22-2016 04:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In in an advanced architecture you would leverage Zookeeper to announce a new model to the topology without taking it offline.
Created ‎03-22-2016 05:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
PMML is certainly a good option, but be aware that Spark does not support the transformation elements of PMML, so you will need to recreate any feature scaling and transformation before the scoring step.
The other thing to note is that many of the Spark Model classes do not depend on the spark context, so you can link spark to you storm topology and just use the Spark Model itself.
This can lead to some unnecessary code in your jar, but has the advantage that you don't need to go through the PMML format.
Created ‎03-22-2016 07:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
+1 for the aspect to reuse Spark code itself
