Member since
07-29-2013
366
Posts
69
Kudos Received
71
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4992 | 03-09-2016 01:21 AM | |
4255 | 03-07-2016 01:52 AM | |
13368 | 02-29-2016 04:40 AM | |
3971 | 02-22-2016 03:08 PM | |
4962 | 01-19-2016 02:13 PM |
12-07-2014
04:08 PM
One late reply here: this bug fix may be relevant to the original problem: https://github.com/cloudera/oryx/issues/99 I'll put this out soon in 1.0.1
... View more
12-07-2014
07:23 AM
Just to follow up, check out this thread (http://community.cloudera.com/t5/Apache-Hadoop-Concepts-and/spark-jobserver-and-spark-jobs-in-Hue-on-CDH5-2/m-p/22410#M1369) where I detail re-building the spark-jobserver and getting things to work. So, it does look like the problem I encountered was due to the CDH5.2 QuickStart VM having a version of the spark-jobserver that was compiled against Spark 0.9.1 causing the error I encountered due to incompatibilities with Spark 1.1.0. Thanks, Chris
... View more
12-05-2014
05:22 AM
1 Kudo
That's perhaps too broad to answer here. Generally, any algorithm that is data-parallel will do well on Spark (or indeed, on MapReduce). And ones that aren't data-parallel do not. I am not familiar with any of those algorithms, but that's the question to answer.
... View more
11-28-2014
08:27 AM
Hadoop still has config files for sure. They can end up wherever you want them to. I though they're still at $HADOOP_HOME/conf in the vanilla Hadoop tarball, but I took a look at 2.5.2 and it's at$HADOOP_HOME/etc/hadoop in fact. In any event if they're at /usr/local/hadoop/etc/hadoop in your installation, then that's what you set $HADOOP_CONF_DIR to. Just wherever they really are. This is one of Hadoop's standard environment variables. If you're up and running then this is working. Yes that sounds like about what you do to install snappy. They are libs that should be present on the cluster machines.
... View more
11-19-2014
05:53 AM
Thanks veru much, I am new to Spark and drank the coolade on the communative and assoviative mandate for Cluster-based algorighms I very much appreciate you providing me an accurate view on implementations. I am very interested in the parelization of SNA and ML algorithms on clusters and appreciate any reading/references you can provide me. Thanks again for your time and insignt and I appreciate any further insight you can provide. In short; thanks Mate! Chris
... View more
11-19-2014
01:53 AM
It looks like you asked for more resources than you configured YARN to offer, so check how much you can allocate in YARN and how much Spark asked for. I don't know about the ERROR; it may be a red herring. Please have a look at http://spark.apache.org/docs/latest/ for pretty good Spark docs.
... View more
11-19-2014
12:10 AM
1 Kudo
I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.
... View more
11-10-2014
02:43 PM
1 Kudo
Apologies, I'm mixing up 1.x and 2.x. The default evaluation metric in 1.x is mean average precision, or MAP. This is a measure of how much the top recommendations contained some items that were held out for the user. In local mode you can find lines like "Mean average precision: xxx" in the logs. In distributed mode, now that I review the code, I don't see that it is ever logged. It is written to a file called "MAP" under the subdirectory for the iteration. I can make the mapper workers output their own local value of MAP at least. In 2.x the metric is AUC, which is basically a measure of how likely it is that a 'good' recommendation (from the held out data set) ranks above a random item. It is a broader, different measure. This you should find printed in the logs if you're using 2.x for sure, along with hyperparams that yielded it.
... View more
11-10-2014
02:18 PM
hello srowen, what I meant by analysis using Mahout algorithms on top of hadoop cluster as well MapReduce preprocessing tasks for instnstance tokenization, stemming and translation of 10 million of tweets. thank you
... View more
11-06-2014
02:02 PM
Including the all the jars worked like a charm. Regarding the "64MB limit", I wasn't able to upload the uber-jar to HDFS via Hue (Error:Undefined) and from some searches I saw people claiming that they thought it was a size issue. Thanks!
... View more