Support Questions

Jason.Chen · ‎10-25-2014

Sean,

We are investigating the ways to retrieve and modify the latent feature vectors on the fly...
One practical case is to resolve "cold start" problem.
For example, given a new item without any user-item associations, we want to "approximate" the item's latent vector.
An idea is taking the new item and comparing it's similarity (attribute based similarity, not latent feature similarity) to other items. Then, get the k-NN items' latent vectors to approximate the new items' "latent" vector. Basically, it's a k-NN based approach.
There could be other approaches. Anyway, let say, the new item's latent vector is estimated somehow. Then, we want to "insert" this new entry to existing item latent vectors (Y matrix). I understand there is no end-point API to do that. However, is it feasible to work around, say, using the Java jar level (computation/serving jar) to getY and then modify Y matrix and "save" it back ? Any suggestions are welcome.

Thanks.
Jason

srowen · ‎10-26-2014

It is not hard to expose, but seems like an internal implementation detail. The implementation already solves the cold start problem in a different way with fold-in. One issue with what you're suggesting is that there is no notion of attributes in the model. I assume you mean you have that externally. I understand the logic but it's a fairly different recommender model that you're making then. I think I'd direct you to just hack the code a bit. But I can keep this in mind in case several other use cases pop up that would make it make sense to just let the item vectors be set externally.

The oryx2 design is much more decomposed so you could put in another process that feeds any item/user updates you want onto a queue of updates. But this is a ways from being ready.

View solution in original post

srowen · ‎03-29-2015

Yes, that's right.

View solution in original post

srowen · ‎10-26-2014

It is not hard to expose, but seems like an internal implementation detail. The implementation already solves the cold start problem in a different way with fold-in. One issue with what you're suggesting is that there is no notion of attributes in the model. I assume you mean you have that externally. I understand the logic but it's a fairly different recommender model that you're making then. I think I'd direct you to just hack the code a bit. But I can keep this in mind in case several other use cases pop up that would make it make sense to just let the item vectors be set externally.

The oryx2 design is much more decomposed so you could put in another process that feeds any item/user updates you want onto a queue of updates. But this is a ways from being ready.

Jason.Chen · ‎12-06-2014

Sean,

To follow up this, I would like to get your suggestion to hack the code a bit.

The goal is getting the latent features for items and users and where is a good starting point?

Thanks.

srowen · ‎12-07-2014

I think you can perhaps see in the servlets under als-serving/ how they access the data structure for a generation, which includes a map from IDs to float[] (latent feature vectors) in memory. You could just clone how one of the servlets works and is initialized, and change it to return feature vectors.

Jason.Chen · ‎12-09-2014

Sean,

Thanks.

Ya, I figured it out and is able to get latent features for users using some functions call (e.g., getCurrentGeneration(), getX(), getIDMapping() )

It seems latent vector retrieval is fine. I cannot figure out the way to append a latent vector to Matrix X. Say, there is a new user, we have an external routine figuring out his latent vector (based on kNN of user profiles and latent features from Oryx). Now, I want to "append" the new latent vector to Matrix X and also idMapping. Any guidance about how to perform this is appreciated.

Thanks.

srowen · ‎12-09-2014

It's a bit complex due to all the locks (2.x is simpler in this regard) but you should be able to trace the logic from something like PreferenceServlet, which can add new users/items to the data structures.

Jason.Chen · ‎12-10-2014

PreferenceServlet adds user-item preferences.

For cold-start users, there are no such info.

I am wondering if there are the ways to "add" latent vectors (X matrix) directly without breaking other related data structure.

Thanks.

srowen · ‎12-11-2014

When you add a user-item association, you have at least 1 data point for the user and item!

Before that, you have no info at all. You can't make any recommendations no matter what approach you use.

You can add feature vectors directly, sure, but how would you know what to add?

Jason.Chen · ‎12-11-2014

Sean,

We are trying to handle the cold start user case.

This implies there is no known user-item association.

Our approach is using users' profile.

(1) Given a new user u, find the kNN users based on profile similarity.

(2) Use these kNN users's latent vectors (from X matrix) to approximate the latent feature for new user u.

So, the scenario is that we "already" know the latent feature to add. Question becomes "how to add the feature vector directly to Oryx"?

I am afraid to break internal structure and break approximation and re-scoring logic. That's why I am looking for your guidance/suggestions in the given scenario that we know what latent vector to "add"...

One "work around" I am thinking is that maybe I can simulate a dummy user-item association for the new user and add that dummy preference to Oryx. And, then, after kNN based latent vector computation, I "modify" (not "add") the latent vector of that new user from existing X matrix.

Thanks.

srowen · ‎12-11-2014

This may be semantics. If you have no data, you can't make any recommendation, so we must be talking about starting from some data. In the normal case there is only user-item interaction data, but you have this side information you're incorporating before the first interaction. OK. You can modify the code to simply add the new entry to the map containing IDs and feature vectors. What's the issue with that? I assume you're already modifying the code. Yes you need to be careful about the locks but there is not much else to know.

Cloudera Community

Support Questions

Retrieve and modify latent feature vectors on the fly ?