Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

handling common issues in recommendation

handling common issues in recommendation

New Contributor


I'am new and would like to know how Oryx framework handle a set of common issues in recommendation,
1. tyrany of popular products in the recommendation
2. how to handle new products  or products having low/no signal in the recommendation
3. how to handle the pb of diversity or overspecification of the recommendation
4. how to handle meta data  (stock availability of a product , ratio of exposition)
5. how to handle user context in the recommendation  (a product already put in the cart of the user or already bought by the user, and so that we dont want to recommend anymore or at least during his/her shop session)
do you have a set of links to give  me answering each of these questions?
thanks in advance



Re: handling common issues in recommendation

Master Collaborator

Sure, I can give you a sketch of some answers.


If you're looking at Oryx, you're looking at an approach based on matrix factorization (ALS). This tends to handle popular products fine. On the one hand, their popularity does matter, but it's incorporated in a way that doesn't dominate the result. The model wants to predict more strongly that you would interact with items that you in fact interact with a lot, but not at all costs.


A factored matrix model makes the cold-start problem pretty small. You still can't recommend to a user with zero data (well, you can always recommend most-popular items), but from the first datum you can construct an approximate user vector and therefore make some recs. The user vector can only be so good since it's based on one data point, but the model has a principled answer from the first data point.


I don't think these models do particularly well or badly with regard to diversity. It's not always clear whether you want a lot or little of diversity in results. I think for that "it depends" and you'd have to try the model on your data to see how well it does or doesn't match your expectations.


Metadata does not factor directly into the model. But, you can record user interactions with, say, a "category" as if it were another item being interacted with. The model is perfectly happy to do so and that's a reasonable way of adding tag-like information to the model.


Things like stock availability tend to be filtering criteria placed on top of the final results. The caller can always handle that if desired, although it gets tricky in situations where only a few results are left after the predicate has been applied. In Oryx there is a notion of "Rescorer" which lets you tack on server-side filtering logic. It's more work but can be faster.


Same thing for user context -- those sound like filtering criteria. You can also use this mechanism to boost or penalize results instead of filtering.


To get started, follow the github example:


and have a look at the endpoints:


If you are interested in ALS:

Don't have an account?
Coming from Hortonworks? Activate your account here