Member since
07-29-2013
366
Posts
69
Kudos Received
71
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4990 | 03-09-2016 01:21 AM | |
4255 | 03-07-2016 01:52 AM | |
13366 | 02-29-2016 04:40 AM | |
3968 | 02-22-2016 03:08 PM | |
4962 | 01-19-2016 02:13 PM |
09-22-2015
02:36 AM
I wouldn't modify that file. Instead, include your libraries with your app or using --jars, but also try setting spark.{driver,executor}.userClassPathFirst to true. Resolving these conflicts is tricky in Spark, where you use a library that Spark does too and does not shade, but this is the answer in most cases.
... View more
09-19-2015
06:40 AM
Replications is an HDFS-level configuration. It isn't something you configure from Spark, and you don't have to worry about it from Spark. AFAIK you set a global replication factor, but can set it per directory too. I think you want to pursue this via HDFS.
... View more
09-17-2015
02:09 AM
1 Kudo
I suppose you can cluster term vectors in V S for this purpose, to discover related terms and thus topics. This is the type of problem where you might more usually use LDA. I know you're using Mahout, but if you ever consider using Spark, there's a chapter on exactly this in our book: http://shop.oreilly.com/product/0636920035091.do
... View more
09-17-2015
01:30 AM
1 Kudo
The output is as you say -- these are the products of the SVD. You can do what you want with them, and it depends on what you're trying to achieve. You can look at the matrix V S to study term similarities, or U S to discover document similarities for example.
... View more
09-14-2015
09:05 AM
1 Kudo
That would be unsupported. I think you'd find support would try to help in this case, but if it legitimately looked like a problem with Spark 1.4, would decline to pursue it. Spark 1.5 is supported in CDH 5.5 of course, coming soon.
... View more
09-13-2015
12:04 PM
In general it means executors need more memory, but it's a fairly complex question. Maybe you need smaller tasks so that peak memory usage is lower. Maybe cache less or use lower max cache level. Or more executor memory. Maybe at the margins better GC settings. Usually the place to start is deciding whether your computation is inherently going to scale badly and run out of memory in a certain stage.
... View more
09-13-2015
12:36 AM
This much basically says "the executor stopped for some reason". You'd have to dig in to the application via YARN, and click through to its entry in the history server, to browse those logs, and see if you can find exceptions in the executor log. It sounds like it stopped responding. As a guess, you might be out of memory and stuck in GC thrashing.
... View more
01-15-2015
07:53 PM
1 Kudo
Is this library bundled with your app? One guess would be that it is not, and happens to be on the classpath from another dependency on the driver, but is not accidentally found on the executors.
... View more
01-05-2015
03:22 AM
No it is not the same 'because' computation in the paper. The one in the paper is better. However it requires storing a k x k matrix for every user, or computing it on the fly, both of which are pretty prohibitive. They're not hard to implement though. This is a cheap-o non-personalized computation based on item similarity. No, the system does not serve the original data, just results from the factored model. It's assumed that, if the caller needs this info, the caller has it, and its purpose is generally not specific to the core recommender, so accessing this data is not part of the engine.
... View more
12-23-2014
12:47 AM
That's fine. The machine needs to be able to communicate with the cluster of course. Usually you would make the Hadoop configuration visible as well and point to it with HADOOP_CONF_DIR. I think that will be required to get MapReduce to work.
... View more