Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oryx 1 ALS computation with Hadoop

Oryx 1 ALS computation with Hadoop

Explorer

Sean,

 

I want to confirm one thing.

In Oryx 1 ALS, there are iterations to compute latent features for users and items.

 

Say, during the computation for latent features for an item, the user latent features matrix X (computed so far) is fixed, and the item's

latent feature is updated (as the Least Squares regression)...

 

In Hadoop case, say, I have 5 yarn nodes, in the implementation, is the same X matrix passed to each of Hadoop node when it's running the ALS for items

(that's , there are 5 copies of X and each one is inside each Hadoop node) ?

 

Thanks.

 

Jason

5 REPLIES 5
Highlighted

Re: Oryx 1 ALS computation with Hadoop

Master Collaborator

Yes, that's how it works in 1.x. Really, a subset of X is passed to workers depending on which rows they will need, so it's not the whole matrix. This subset is loaded into memory, so it's doing a fast but memory-hungry in-memory join.

Highlighted

Re: Oryx 1 ALS computation with Hadoop

Explorer

When it solves by normal equaltion, wouldn't it need the whole X matrix for XtX (or YtY) ?

 

Highlighted

Re: Oryx 1 ALS computation with Hadoop

Master Collaborator

That's right, but this term can be pre-computed fairly efficiently and sent to the workers. XtX is quite small relative to X since the matrix is tall and skinny.

Highlighted

Re: Oryx 1 ALS computation with Hadoop

Explorer

Thanks.

A related question.

I notice the regularization parameter lamda is weighted by ru

https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/f...

https://github.com/cloudera/oryx/blob/master/als-computation/src/main/java/com/cloudera/oryx/als/com...

 

This is different from original Hu's implicit feedback paper.

Any idea why we want to weight lamda ?

 

 

Highlighted

Re: Oryx 1 ALS computation with Hadoop

Master Collaborator

Yes, this is the 'weighted regularization' idea grafted on from another paper, http://link.springer.com/chapter/10.1007%2F978-3-540-68880-8_32

 

MLlib does it too and explains one pretty good reason for it: http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#scaling-of-the-regularization...

 

You could also say it exists to not heavily favor fitting the preferences of prolific users.

Don't have an account?
Coming from Hortonworks? Activate your account here