About srowen

srowen · ‎12-07-2014

One late reply here: this bug fix may be relevant to the original problem: https://github.com/cloudera/oryx/issues/99 I'll put this out soon in 1.0.1

Chris_p · ‎12-07-2014

Just to follow up, check out this thread (http://community.cloudera.com/t5/Apache-Hadoop-Concepts-and/spark-jobserver-and-spark-jobs-in-Hue-on-CDH5-2/m-p/22410#M1369) where I detail re-building the spark-jobserver and getting things to work. So, it does look like the problem I encountered was due to the CDH5.2 QuickStart VM having a version of the spark-jobserver that was compiled against Spark 0.9.1 causing the error I encountered due to incompatibilities with Spark 1.1.0. Thanks, Chris

srowen · ‎12-05-2014

That's perhaps too broad to answer here. Generally, any algorithm that is data-parallel will do well on Spark (or indeed, on MapReduce). And ones that aren't data-parallel do not. I am not familiar with any of those algorithms, but that's the question to answer.

srowen · ‎11-28-2014

Hadoop still has config files for sure. They can end up wherever you want them to. I though they're still at $HADOOP_HOME/conf in the vanilla Hadoop tarball, but I took a look at 2.5.2 and it's at$HADOOP_HOME/etc/hadoop in fact. In any event if they're at /usr/local/hadoop/etc/hadoop in your installation, then that's what you set $HADOOP_CONF_DIR to. Just wherever they really are. This is one of Hadoop's standard environment variables. If you're up and running then this is working. Yes that sounds like about what you do to install snappy. They are libs that should be present on the cluster machines.

crigano · ‎11-19-2014

Thanks veru much, I am new to Spark and drank the coolade on the communative and assoviative mandate for Cluster-based algorighms I very much appreciate you providing me an accurate view on implementations. I am very interested in the parelization of SNA and ML algorithms on clusters and appreciate any reading/references you can provide me. Thanks again for your time and insignt and I appreciate any further insight you can provide. In short; thanks Mate! Chris

srowen · ‎11-19-2014

It looks like you asked for more resources than you configured YARN to offer, so check how much you can allocate in YARN and how much Spark asked for. I don't know about the ERROR; it may be a red herring. Please have a look at http://spark.apache.org/docs/latest/ for pretty good Spark docs.

srowen · ‎11-19-2014

I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.

srowen · ‎11-10-2014

Apologies, I'm mixing up 1.x and 2.x. The default evaluation metric in 1.x is mean average precision, or MAP. This is a measure of how much the top recommendations contained some items that were held out for the user. In local mode you can find lines like "Mean average precision: xxx" in the logs. In distributed mode, now that I review the code, I don't see that it is ever logged. It is written to a file called "MAP" under the subdirectory for the iteration. I can make the mapper workers output their own local value of MAP at least. In 2.x the metric is AUC, which is basically a measure of how likely it is that a 'good' recommendation (from the held out data set) ranks above a random item. It is a broader, different measure. This you should find printed in the logs if you're using 2.x for sure, along with hyperparams that yielded it.

ahmed1988 · ‎11-10-2014

hello srowen, what I meant by analysis using Mahout algorithms on top of hadoop cluster as well MapReduce preprocessing tasks for instnstance tokenization, stemming and translation of 10 million of tweets. thank you

Intern9 · ‎11-06-2014

Including the all the jars worked like a charm. Regarding the "64MB limit", I wasn't able to upload the uber-jar to HDFS via Hue (Error:Undefined) and from some searches I saw people claiming that they thought it was a size issue. Thanks!

Online	Offline
Last Visited	‎02-06-2015 02:06 PM

Member Since	‎07-29-2013 08:58 AM
Last Visited	‎02-06-2015 02:06 PM
Posts	366
Kudos received	62

Cloudera Community

Re: CDH 5.6

Re: How to use Oryx 1 to detect spam email

Re: Spark program in eclipse

Re: Graphx in latest CDH

Re: Maturity ORYX

Re: Oryx ALS: X and Y do not have sufficient rank

Re: NoSuchMethodError when submitting Spark jobs w...

Re: What kind of algorithm can be written with spa...

Re: Unable to run Oryx with Hadoop, exception java...

Re: Does Spark ML and Mahout support Random Forest...

Re: Spark on YARN in CDH-5

Re: Problems with cloudera quickstart/spark/readin...

Re: Performance evaluation in Oryx

Re: 3 Nodes Hadoop Cluster

Re: Getting exception => java.lang.NoClassDefFound...