About srowen

srowen · ‎11-18-2014

It means basically what it says, that you're writing some program that accesses /user/spark but you're not running as spark, the user that can access that directory.

srowen · ‎11-18-2014

You're running on YARN, so you should see the application as a "FAILED" application in the Resource Manager UI. Click through and you can find the logs of individual containers, which should show some failure.

srowen · ‎11-18-2014

This isn't the problem. It's just a symptom of the app failing for another reason: 14/11/18 16:27:36 ERROR YarnClientSchedulerBackend: Yarn application already ended: FAILED You'd have to look at the actual app worker logs to see why it's failing.

srowen · ‎11-18-2014

Yes, you shouldn't be able to run this as a stand-alone app. Hm, try putting the jar file last? that is how the script says to do it.

srowen · ‎11-18-2014

You need <scope>provided</scope> as well.

srowen · ‎11-18-2014

How are you executing this? it sounds like you may not be using spark-submit, or, are accidentally bundling Spark (perhaps a slightly different version) into your app. Spark deps should be 'provided' in your build and you'll want to use spark-submit to submit. You don't set master in your SparkConf in code.

srowen · ‎11-18-2014

Hm, what do you mean by commutative and associative? and do you mean Hadoop clusters? I'm not sure there's a particular limit to what a Hadoop cluster can do well other than that it's fundamentally a data-parallel paradigm. But most things can be done efficiently in this paradigm, especially random forests. The only things that don't work well are things that require extremely fast async communication -- MPI-style computations. Decision forests are strong and you can certainly do them well on Hadoop.

srowen · ‎11-17-2014

Random decision forests in MLlib 1.2 can do classification or regression. Yes it can do bagging. I don't believe it's by feature, no.

srowen · ‎11-16-2014

MLlib supports SVMs in Spark 1.1. It supports Decision Trees in 1.1, and Decision Forests in 1.2, which is not quite yet released. Mahout has an implementation of SVMs and Decision Forests. They are both fairly old and MapReduce-based.

srowen · ‎11-11-2014

Hm, why not just use the Spark that is part of CDH? If you want 1.1, can you update to CDH 5.2? Are there more logs? this isn't the underlying error.

Online	Offline
Last Visited	‎02-06-2015 02:06 PM

Member Since	‎07-29-2013 08:58 AM
Last Visited	‎02-06-2015 02:06 PM
Posts	366
Kudos received	62

Cloudera Community

Re: CDH 5.6

Re: How to use Oryx 1 to detect spam email

Re: Spark program in eclipse

Re: Graphx in latest CDH

Re: Maturity ORYX

Re: Problems with cloudera quickstart/spark/readin...

Re: Getting "Job cancelled because SparkContext wa...

Re: Getting "Job cancelled because SparkContext wa...

Re: Problems with cloudera quickstart/spark/readin...

Re: Problems with cloudera quickstart/spark/readin...

Re: Problems with cloudera quickstart/spark/readin...

Re: Does Spark ML and Mahout support Random Forest...

Re: Does Spark ML and Mahout support Random Forest...

Re: Does Spark ML and Mahout support Random Forest...

Re: Spark 1.1.0 on cdh5.1.3 does not work in yarn-...