Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How Oryx serving layer knows there is new model generation?

avatar
Explorer

Sean,

 

(1)  How Oryx serving layer knows there is new model generation?

I traced the codes and it seems continues polling to check the generation status from folder.

And, the delay for polling seems different from "no available generation" status (different minutes)...

 

Is that right ? Can you explain a little bit?

 

(2) When the new generation is replacing the old generation, I see a variable "recentlyActiveUsers"... What's that used for ?

https://github.com/cloudera/oryx/blob/85001d5fce34d66c6d9179ab6e82e1cecb3b17ee/als-serving/src/main/...

 

(3) Question on "loadModel(int generationID, Generation currentGeneration)" ? How the model of old generation is replaced by new generation?

https://github.com/cloudera/oryx/blob/85001d5fce34d66c6d9179ab6e82e1cecb3b17ee/als-serving/src/main/...

Originally, I am thinking new generation is loaded from disk to memory and then the old-generation in the memory is replaced by loaded

new generation. However, after tracing the codes, it seems there is no such "hot-swap".

Based on my tracing and assume currentGeneration=2 and generationID=3 (that's, currentGeneration is going to be replaced by 3).

It seems (step1) the new model is saved to currentGeneration and (step2) the old entries left in currentGeneration are removed.

If such a case, it looks possible there is an "intermediate" model mixing old and new generation. Will this "intermediate" model be queried

by the recommender ? Can you confirm or explain a little bit ?

 

 

Thanks.

 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

It is just polling HDFS for new files on the order of ~5 minutes or so. No that message is exactly from this process of refreshing the model by looking for any new model. "No available generation" means no models have been built.

 

There's a delay between the time new data arrives -- which could include a new user or item -- and when that is incorporated into a model. It could be a long time depending on how long you take to build models. When a new model arrives, you can't just drop all existing users, since the new model won't have any info about very new users or items. This is to help keep track of which users/items should be retained in memory even if they do not exist in the new model.

 

The new model replaces the old one user-by-user and item-by-item rather than by loading an entire new model. Yes you have a state with old and new data at once but this is fine for recommendations; they're not incompatible. It's just the current and newer state of an estimate of the user/item vectors.

View solution in original post

1 REPLY 1

avatar
Master Collaborator

It is just polling HDFS for new files on the order of ~5 minutes or so. No that message is exactly from this process of refreshing the model by looking for any new model. "No available generation" means no models have been built.

 

There's a delay between the time new data arrives -- which could include a new user or item -- and when that is incorporated into a model. It could be a long time depending on how long you take to build models. When a new model arrives, you can't just drop all existing users, since the new model won't have any info about very new users or items. This is to help keep track of which users/items should be retained in memory even if they do not exist in the new model.

 

The new model replaces the old one user-by-user and item-by-item rather than by loading an entire new model. Yes you have a state with old and new data at once but this is fine for recommendations; they're not incompatible. It's just the current and newer state of an estimate of the user/item vectors.