About srowen

srowen · ‎11-20-2014

I would not mess with the installed JARs. In fact it's possible something has been changed inadvertently in this process. My guess is that you are packaging Spark with your app? and it is not the same version. You do not package Spark code with a Spark app as it is provided.

srowen · ‎11-19-2014

It looks like you asked for more resources than you configured YARN to offer, so check how much you can allocate in YARN and how much Spark asked for. I don't know about the ERROR; it may be a red herring. Please have a look at http://spark.apache.org/docs/latest/ for pretty good Spark docs.

srowen · ‎11-19-2014

Yes, in that example you are clearly running on YARN. So you see it in the history, right? It looks like the example uses yarn-cluster mode, which means the driver was launched on YARN, not locally. The output will be on the YARN container that had the driver. Try yarn-client instead to make your local process the driver and it should print the result on your console.

srowen · ‎11-19-2014

Are you running Spark on YARN, or using Spark standalone? if the latter, you won't see any YARN history since it's not using YARN.

srowen · ‎11-19-2014

I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.

srowen · ‎11-19-2014

Yes I know what commutativity and associativity are, I was wondering how it related to Hadoop and decision forests. In theory a reduce function should be commutative and associative, but in practice it does not need to be in MapReduce, and a MapReduce as a unit is not, and certainly Spark is not. There is no practical computation paradigm limitation of this form. I looked into the MLlib RDF code and it does look like it selects features too at random, depending on the configuration. So you could say it bags by examples and features. The oryx implementation also certainly does all of what you describe. https://github.com/cloudera/oryx/tree/master/rdf-computation

srowen · ‎11-19-2014

That sounds like a bad command line. I don't see that path in the instructions either. Check that you are following the instructions for 5.2 in the previous link.

srowen · ‎11-18-2014

You should use the documentation for CDH 5.2, which you are using and which corresponds to Spark 1.1: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_running_spark_apps.html You are looking at docs for CDH 5.0.x, which corresponds to Spark 0.9. A lot has changed since then.

srowen · ‎10-29-2014

By default the distributed implementations will prune infrequent items and low similarity to scale up. With a tiny data set, this can mean some items are removed entirely, or become un-recommendable. While you can change this behavior, it is not useful to run the Hadoop-based job on such a small data set.

srowen · ‎10-29-2014

There is no user-based recommender based on Hadoop MapReduce. The closest thing is indeed an item-based implementation in https://github.com/apache/mahout/tree/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item It is also a recommender, but no it is always using item similarity.

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Spark java.lang.IllegalStateException: unread ...

Re: Spark on YARN in CDH-5

Re: Spark on YARN in CDH-5

Re: No Completed Application Found in Spark Histor...

Re: Problems with cloudera quickstart/spark/readin...

Re: Does Spark ML and Mahout support Random Forest...

Re: Spark on YARN in CDH-5

Re: Spark on YARN in CDH-5

Re: Distributed Recommender not giving recommendat...

Re: Distributed recommenderjob for user-based reco...