About srowen

srowen · ‎03-05-2014

Ha, you mean is it memoryless? Yes. You can just ship the whole generation directory around. To roll back generations, just delete newer ones and restart the serving layer. etc.

srowen · ‎03-05-2014

The project uses Maven for builds. Eclipse supports Maven builds directly, so therefore supports Eclipse. (Or if there are any problems, it is in the Eclipse integration with Maven.) I use IntelliJ and recommend it, but, there is nothing additional to say here about working with Eclipse. It just works, because the Maven build works, and Eclipse uses Maven builds. Have you modified the build? You say for example that you had to add the avro plugin to pluginManagement, but it is certainly already there in the parent POM: https://github.com/cloudera/oryx/blob/master/pom.xml#L671 You say there is a compile error, but the dependencies you seem to be missing are correctly specified in the POM already. That is: you can verify that the build is correct with "mvn compile". You need to clarify what you are doing because I don't think you are building the project as-is.

srowen · ‎03-05-2014

It would be helpful to know what errors you are getting. It should build fine with Maven, as you show.

srowen · ‎02-18-2014

Hard filtering rules need to be implemented in a RescorerProvider, or in logic on the caller side. Tagging users and items with a locale could make sense. It would function as a soft filter nudging people towards things in the same locale. That could be useful as well, but is a different thing from implementing business rules. If your items and users are nearly completely disjoint by locale (e.g. very few items are available in multiple locales and very few users shop in multiple locales) then separate models might be the best way to go. No filtering logic needed although you then manage a model per locale. But the models are smaller and easier to handle. If there is moderate overlap, then a unified model can benefit from the cross-locale learning.

srowen · ‎02-13-2014

Yes, that's the same me. Please have a look at the project page for some simple examples to get started: https://github.com/cloudera/oryx There is no Mongo integration per se, but you could always hack on it.

srowen · ‎02-10-2014

That's good, although I am still not sure why it worked fine for me with quite different params. The transformation should not have done much. It could be that the singularity tolerance is too strict, but I doubt it. There's going to be a fairly big rewrite of the computation, to use Spark in some parts for example. As part of that I am going to build in evaluation to the pipeline itself, so that it's always tuning as it goes. It's not going to come out soon -- just in design phase -- but the idea is that this should not be something anyone has to do by hand. For practical purposes, I would just proceed with these params for now and return to the idea of optimization later. I am guessing (?) your real data set is different anyway and would require different params. Or for this data set you could use the local build.

srowen · ‎02-10-2014

It's "normal" for this result to happen if the parameters are way out of kilter for the data set. I suppose it tends to be easier for that to happen with small data. So whether it's reproducing a problem depends on the data. But if you think the params are quite reasonable for the data and you see this, yes please send it to me.

srowen · ‎02-09-2014

This is good. There is no performance difference between computing 10 and 100 recommendations since it still considers all non-filtered items each time. (OK I suppose it takes a tiny bit longer to send 100 results over the network than 10.) The results are not precomputed but computed on the fly each time.

srowen · ‎02-08-2014

Oops, fixed. Yes I'm using CDH5b1 too, so that's not a difference. Can you compile from HEAD to make sure we're synced up there? you may already be, just checking. I can make a binary too. Any logs would be of interest for sure. I suppose I would suggest trying again with clearly small values for features (like 10) and clearly small values for lambda (like 0.0001) to see if that at least works. I would expect a lower number of features might be appropriate given there are a smallish number of items. You might try the optimizer again with lower ranges for both. More features encourages overfitting and more lambda encourages underfitting, so they kind of counter-act. It's possible you find a better value when both are low.

srowen · ‎02-08-2014

Yes that explains why you didn't see the same initial problem. Well, good that was fixed anyhow. Text vs numeric shouldn't matter at all. Underneath they are both hashed. Looks the amount of data and its nature are the same if it's just that IDs were hashed. I can't imagine collisions are an issue. I tried converting these 1-1 to an ID that is alphanumeric, and it worked for me. You are using CDH 4.x vs 5 right? could be a different, but still don't quite expect a problem would be of this form. Anything else of interest in the logs? you're welcome to send me all of it. You're starting from scratch when you run the test ?

Online	Offline
Last Visited	‎02-06-2015 02:06 PM

Member Since	‎07-29-2013 08:58 AM
Last Visited	‎02-06-2015 02:06 PM
Posts	366
Kudos received	62

Cloudera Community

Re: CDH 5.6

Re: How to use Oryx 1 to detect spam email

Re: Spark program in eclipse

Re: Graphx in latest CDH

Re: Maturity ORYX

Re: Is the ALS model markovian?

Re: How to build oryx in eclipse

Re: How to build oryx in eclipse

Re: USER and ITEM groups: How best to handle in Or...

Re: how to running my mahout application in cdh4

Re: Oryx ALS: X and Y do not have sufficient rank

Re: Oryx ALS: X and Y do not have sufficient rank

Re: Oryx: API method unavailable until model has b...

Re: Oryx ALS: X and Y do not have sufficient rank

Re: Oryx ALS: X and Y do not have sufficient rank