Created on 07-01-2015 05:55 PM - edited 07-01-2015 10:16 PM
Sean,
(1) How Oryx serving layer knows there is new model generation?
I traced the codes and it seems continues polling to check the generation status from folder.
And, the delay for polling seems different from "no available generation" status (different minutes)...
Is that right ? Can you explain a little bit?
(2) When the new generation is replacing the old generation, I see a variable "recentlyActiveUsers"... What's that used for ?
(3) Question on "loadModel(int generationID, Generation currentGeneration)" ? How the model of old generation is replaced by new generation?
Originally, I am thinking new generation is loaded from disk to memory and then the old-generation in the memory is replaced by loaded
new generation. However, after tracing the codes, it seems there is no such "hot-swap".
Based on my tracing and assume currentGeneration=2 and generationID=3 (that's, currentGeneration is going to be replaced by 3).
It seems (step1) the new model is saved to currentGeneration and (step2) the old entries left in currentGeneration are removed.
If such a case, it looks possible there is an "intermediate" model mixing old and new generation. Will this "intermediate" model be queried
by the recommender ? Can you confirm or explain a little bit ?
Thanks.
Created 07-02-2015 12:14 AM
It is just polling HDFS for new files on the order of ~5 minutes or so. No that message is exactly from this process of refreshing the model by looking for any new model. "No available generation" means no models have been built.
There's a delay between the time new data arrives -- which could include a new user or item -- and when that is incorporated into a model. It could be a long time depending on how long you take to build models. When a new model arrives, you can't just drop all existing users, since the new model won't have any info about very new users or items. This is to help keep track of which users/items should be retained in memory even if they do not exist in the new model.
The new model replaces the old one user-by-user and item-by-item rather than by loading an entire new model. Yes you have a state with old and new data at once but this is fine for recommendations; they're not incompatible. It's just the current and newer state of an estimate of the user/item vectors.
Created 07-02-2015 12:14 AM
It is just polling HDFS for new files on the order of ~5 minutes or so. No that message is exactly from this process of refreshing the model by looking for any new model. "No available generation" means no models have been built.
There's a delay between the time new data arrives -- which could include a new user or item -- and when that is incorporated into a model. It could be a long time depending on how long you take to build models. When a new model arrives, you can't just drop all existing users, since the new model won't have any info about very new users or items. This is to help keep track of which users/items should be retained in memory even if they do not exist in the new model.
The new model replaces the old one user-by-user and item-by-item rather than by loading an entire new model. Yes you have a state with old and new data at once but this is fine for recommendations; they're not incompatible. It's just the current and newer state of an estimate of the user/item vectors.