Member since
07-29-2013
366
Posts
69
Kudos Received
71
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4729 | 03-09-2016 01:21 AM | |
4088 | 03-07-2016 01:52 AM | |
12954 | 02-29-2016 04:40 AM | |
3806 | 02-22-2016 03:08 PM | |
4791 | 01-19-2016 02:13 PM |
12-14-2014
01:33 PM
hm, yes that should not be how it works. If it hasn't decayed or been removed it will stick around forever. If you reach 0 the user-item pair will be removed. Negative values still have a meaning so are not removed so it's a question of how small the absolute value is. Yes, these are removed, including users that don't have any items.
... View more
12-12-2014
03:11 PM
model.generation.keep is really just a bookeeping setting. It just affects how many previous generations are kept around for whatever purpose they may serve -- backup, etc. Each generation has a copy of all data that's ever been seen, in aggregated form, and reduced by decay factor. So, no the default behavior is to keep all data forever. The decay factor is an indirect way to implement a "sliding window" in that it is a way to make old data go away eventually. It's not based on a hard time or generation limit, but I think that's desirable IMHO. The closest thing is to set a decay factor, and a zero threshold, such that roughly the desired number of generations decays a value of "1" to below the threshold.
... View more
12-11-2014
08:57 AM
This may be semantics. If you have no data, you can't make any recommendation, so we must be talking about starting from some data. In the normal case there is only user-item interaction data, but you have this side information you're incorporating before the first interaction. OK. You can modify the code to simply add the new entry to the map containing IDs and feature vectors. What's the issue with that? I assume you're already modifying the code. Yes you need to be careful about the locks but there is not much else to know.
... View more
12-11-2014
02:34 AM
When you add a user-item association, you have at least 1 data point for the user and item! Before that, you have no info at all. You can't make any recommendations no matter what approach you use. You can add feature vectors directly, sure, but how would you know what to add?
... View more
12-09-2014
10:24 PM
It's a bit complex due to all the locks (2.x is simpler in this regard) but you should be able to trace the logic from something like PreferenceServlet, which can add new users/items to the data structures.
... View more
12-09-2014
12:16 PM
The implementations are entirely separate although they do the same thing at a high level. Here the sampling process is different enough that it made a difference only in one place, even though both are sampling the same things. This distributed/non-distributed distinction is historical; there are really two codebases here. This won't be carried forward in newer versions.
... View more
12-09-2014
09:07 AM
So the problem here is just that only a small number of user-item pairs are sampled to test convergence, and it turns out they consistently show a too-low estimate of convergence early on. A quick band-aid is to sample more, and log better messages about it: https://github.com/cloudera/oryx/commit/1ea63b4e493e1cfcf6d1cdc271c52befcdd12402 Too much sampling can slow things down kind of unnecessarily, and I've struggled to find a good heuristic that balances the two. This probably deserves a better bit of logic later, but this change will make this work fine, as will turning down the convergence threshold.
... View more
12-08-2014
04:44 AM
1 Kudo
Yes, when you run on YARN, you see the driver and executors as YARN containers. It is no longer a stand-alone service. You need to use master "yarn-client" or "yarn-cluster". yarn-client may be simpler to start. Have a look at http://spark.apache.org/docs/latest/cluster-overview.html
... View more
12-07-2014
04:08 PM
One late reply here: this bug fix may be relevant to the original problem: https://github.com/cloudera/oryx/issues/99 I'll put this out soon in 1.0.1
... View more
12-07-2014
03:59 PM
OK, I see that too. That's a different problem, although slightly more benign. It uses a sample of all data to estimate whether convergence has happened and here somehow the sample makes it looks like convergence has happened too early. I'll look into why the sample is regularly biased. You can work around by setting model.iterations.convergence-threshold to something low like 0.00001. Right now it's still running past 9 iterations on Hadoop and MAP is about 0.10, so that's the symptom, now to find the cause. Thanks for the issue reports, this data set has turned up some bugs.
... View more