08-07-2014 05:42 AM
08-07-2014 12:09 PM
Sure, I can give you a sketch of some answers.
If you're looking at Oryx, you're looking at an approach based on matrix factorization (ALS). This tends to handle popular products fine. On the one hand, their popularity does matter, but it's incorporated in a way that doesn't dominate the result. The model wants to predict more strongly that you would interact with items that you in fact interact with a lot, but not at all costs.
A factored matrix model makes the cold-start problem pretty small. You still can't recommend to a user with zero data (well, you can always recommend most-popular items), but from the first datum you can construct an approximate user vector and therefore make some recs. The user vector can only be so good since it's based on one data point, but the model has a principled answer from the first data point.
I don't think these models do particularly well or badly with regard to diversity. It's not always clear whether you want a lot or little of diversity in results. I think for that "it depends" and you'd have to try the model on your data to see how well it does or doesn't match your expectations.
Metadata does not factor directly into the model. But, you can record user interactions with, say, a "category" as if it were another item being interacted with. The model is perfectly happy to do so and that's a reasonable way of adding tag-like information to the model.
Things like stock availability tend to be filtering criteria placed on top of the final results. The caller can always handle that if desired, although it gets tricky in situations where only a few results are left after the predicate has been applied. In Oryx there is a notion of "Rescorer" which lets you tack on server-side filtering logic. It's more work but can be faster.
Same thing for user context -- those sound like filtering criteria. You can also use this mechanism to boost or penalize results instead of filtering.
To get started, follow the github example:
and have a look at the endpoints:
If you are interested in ALS: