About srowen

srowen · ‎12-07-2014

No, I see it finish at 6 iterations with MAP about 0.15 on Hadoop. Same data set? double-check that you have the latest build and maybe start from scratch with no other intermediate results.

srowen · ‎12-07-2014

OK, I'm pretty certain I found the bug and fixed it here: https://github.com/cloudera/oryx/commit/437df94d0b1c9d27b5c9f3b984b98973237d6f99 The factorization works as expected now. Are you able to test it too?

srowen · ‎12-07-2014

I think you can perhaps see in the servlets under als-serving/ how they access the data structure for a generation, which includes a map from IDs to float[] (latent feature vectors) in memory. You could just clone how one of the servlets works and is initialized, and change it to return feature vectors.

srowen · ‎12-05-2014

I'm certain it's nothing to do with the input itself. It looks fine and those types of problem would be different.

srowen · ‎12-05-2014

Strange, I do indeed get much different answers on the Hadoop version and they don't look quite right. The first row and column are very small and there's no good reason for that. I'll keep digging in to see where things to funny. The fact that MAP is good suggests that the model is good during iteration but something funny happens at the end.

srowen · ‎12-05-2014

You are right that it's unlikely that the earlier computations would work if the data was low rank. OK synthetic data is ruled out. It's not quite X or Y that is singular or nonsingular, it's X'*X and Y'*Y. Small absolute values in the matrices are normal.

srowen · ‎12-05-2014

Hm, I wonder if the jobserver just needs to be updated in the VM. You could try building and running your own. I've not used the jobserver myself. Someone else may have more insight or it may be a good question for the VM forum.

srowen · ‎12-05-2014

Is this data synthetically generated? I'm also wondering if somehow it really does have rank less than about 6. That's possible for large data sets if they were algorithmically generated.

srowen · ‎12-05-2014

Got it, spark-submit works, but you want to use the jobserver. This could be my ignorance but I thought you still had to build and install the jobserver yourself. Hue doesn't seem to have it in CDH 5.2 but I haven't looked at the VM in a while. Are you building jobserver yourself or no?

srowen · ‎12-05-2014

Yes, that's fine then. You do not need to build against the CDH artifacts, even. You do need to use spark-submit. It could be an issue with the jobserver? how are you deploying that, and is it consistent with your CDH installation?

Online	Offline
Last Visited	‎02-06-2015 02:06 PM

Member Since	‎07-29-2013 08:58 AM
Last Visited	‎02-06-2015 02:06 PM
Posts	366
Kudos received	62

Cloudera Community

Re: CDH 5.6

Re: How to use Oryx 1 to detect spam email

Re: Spark program in eclipse

Re: Graphx in latest CDH

Re: Maturity ORYX

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Retrieve and modify latent feature vectors on ...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: NoSuchMethodError when submitting Spark jobs w...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: NoSuchMethodError when submitting Spark jobs w...

Re: NoSuchMethodError when submitting Spark jobs w...