Created 12-12-2014 02:22 PM
Hi
I would like to implement a sliding window type of data decay with Oryx. This is because I have 2 streams of searches coming in:
- real time searches on the website
- data that decays and is going in via removePreference
The window is of, say, 6 months and every day I remove anything that's older than that, essentially ensuring that anything that goes into a model is from the past 6 months. I will be triggering one model rebuild per day.
Now, my understanding of the generation model is that the latest generation is built out of the aggregation of all available generations up to this last one. This is because addPreference and removePreference will generate patches of the generation in use, save them in current+1, then triggering a model rebuild will generate current + 1 which contains all data we've got so far.
All of this works fine up until we reach generation numbered model.generation.keep, as this will wipe out the initial set which contains the majority of the data and we're left with just the updates.
If what I've explained above is what actually happens, how do I go about configurig Oryx to do the sliding window? Other than aggregating the files myself and restarting from 00000 every once in while?
Thank you.
Created 12-14-2014 01:33 PM
hm, yes that should not be how it works. If it hasn't decayed or been removed it will stick around forever.
If you reach 0 the user-item pair will be removed. Negative values still have a meaning so are not removed so it's a question of how small the absolute value is. Yes, these are removed, including users that don't have any items.
Created 12-12-2014 03:11 PM
model.generation.keep is really just a bookeeping setting. It just affects how many previous generations are kept around for whatever purpose they may serve -- backup, etc. Each generation has a copy of all data that's ever been seen, in aggregated form, and reduced by decay factor. So, no the default behavior is to keep all data forever. The decay factor is an indirect way to implement a "sliding window" in that it is a way to make old data go away eventually. It's not based on a hard time or generation limit, but I think that's desirable IMHO. The closest thing is to set a decay factor, and a zero threshold, such that roughly the desired number of generations decays a value of "1" to below the threshold.
Created 12-14-2014 01:29 PM
I am sorry, I think I was unclear. For decaying old data I just call removePreference myself so there is no need for that to be done automatically by Oryx; that part is settled.
But in some cases that I was not able to reproduce consistently, I found that subsequent generations do not contain data from a generation earlier than model.generations.keep, so the generations are not really cummulative (I did read the thread you had with Jason Chen about generations from some while back, but I posted because I noticed this different behavior). It only happens once in a while and I don't know how to trigger this. I only got it a few times in the past few days (e.g. today I was not able to reproduce this at all). I will come back if I manage to produce some reliable steps that lead to this.
I do have a question on addPreference with negative values, what happens when we decreased enough to reach 0? I've tried that and the known items still remember these "nullified" items. recommend still returns results but they all have a strength of 0. Which I think is expected, as at this point we're supposed to know nothing about this user? For my particular use case, would be best if the items reaching 0 would move out of known items (and I can do that myself by calling removePreference) and, when all the items reached 0, the user would become unknown. Is there a method for this last part?
Thank you.
Created 12-14-2014 01:32 PM
I am sorry, in the first paraghraph I mean I call addPreference with -1, not removePreference (I just kick out whatever searches happened before the window).
Created 12-14-2014 01:33 PM
hm, yes that should not be how it works. If it hasn't decayed or been removed it will stick around forever.
If you reach 0 the user-item pair will be removed. Negative values still have a meaning so are not removed so it's a question of how small the absolute value is. Yes, these are removed, including users that don't have any items.