- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Speed layer in Oryx2
Created ‎05-11-2015 03:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is the speed layer following an incremental learning approach in Oryx2? (during Model Update).
Thanks
Jayani
Created ‎05-11-2015 04:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The speed layer does indeed produce incremental updates. Like the serving layer, it loads the most recent model in memory and then computes how the model might change (approximately, rapidly) in response to new data, and internalizes and publishes those updates. The serving layers then hear the models but also the updates on the queue and update accordingly.
Created ‎05-11-2015 07:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, well, I'd say that the batch layer can do "mini batch" if you simply use a low interval time. It's not a special case, really. I think this project isn't going to add its own data prep pipeline, no, but the idea is that you can use any Java or Spark-based libraries you like as part of your app. There's no need to have a different special set of support in this project.
Created ‎05-11-2015 04:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The speed layer does indeed produce incremental updates. Like the serving layer, it loads the most recent model in memory and then computes how the model might change (approximately, rapidly) in response to new data, and internalizes and publishes those updates. The serving layers then hear the models but also the updates on the queue and update accordingly.
Created on ‎05-11-2015 06:33 AM - edited ‎05-11-2015 06:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Sean for the prompt response 🙂
So, can we use or customize the same speed layer approach for mini-batch learning as well?
Also, does Oryx have any future plans to support built-in pre-processing methods for text analysis such as Tokenization and TF-IDF vector creation?
Thank you
Jayani
Created ‎05-11-2015 07:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, well, I'd say that the batch layer can do "mini batch" if you simply use a low interval time. It's not a special case, really. I think this project isn't going to add its own data prep pipeline, no, but the idea is that you can use any Java or Spark-based libraries you like as part of your app. There's no need to have a different special set of support in this project.
Created ‎05-11-2015 07:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay. Thanks Sean for the information.
Created ‎05-21-2015 10:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sean,
Can you provide more detailed information about how the approximation is computed in Oryx 2.0 ?
Is it the same fold-in approach as Oryx 1.0 ? Can you point to the code base as reference ?
Thanks.
Jason
Created ‎05-22-2015 12:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ALS: yes, fold-in just as before k-means: assign point to a cluster and update its centroid (but don't reassign any other points) RDF: assign point to leaf and update leaf's prediction (but don't change the rest of the tree)
