Created on 12-04-2014 10:11 PM - edited 09-16-2022 02:14 AM
I downloaded the Quick Start VM with CDH 5.2.x so that I could try out CDH 5.2 on VirtualBox.
I can run a Spark job using the command line, but I've been pretty frustrated trying to get a Spark job to run use the Spark app in Hue (http://quickstart.cloudera:8888/spark).
No matter what I try I get a NoSuchMethodError:
Your application has the following error(s): { "status": "ERROR", "result": { "errorClass": "java.lang.RuntimeException", "cause": "org.apache.spark.SparkContext$.$lessinit$greater$default$2()Lscala/collection/Map;", "stack": ["spark.jobserver.JobManagerActor.createContextFromConfig(JobManagerActor.scala:241)", "spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:94)", "scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)", "scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)", "scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)", "ooyala.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)", "scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)", "scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)", "scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)", "ooyala.common.akka.Slf4jLogging$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp(Slf4jLogging.scala:26)", "ooyala.common.akka.Slf4jLogging$class.ooyala$common$akka$Slf4jLogging$$withAkkaSourceLogging(Slf4jLogging.scala:35)", "ooyala.common.akka.Slf4jLogging$$anonfun$receive$1.applyOrElse(Slf4jLogging.scala:25)", "scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)", "scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)", "scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)", "ooyala.common.akka.ActorMetrics$$anonfun$receive$1.applyOrElse(ActorMetrics.scala:24)", "akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)", "akka.actor.ActorCell.invoke(ActorCell.scala:456)", "akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)", "akka.dispatch.Mailbox.run(Mailbox.scala:219)", "akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)", "scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)", "scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)", "scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)", "scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)"], "causingClass": "java.lang.NoSuchMethodError", "message": "java.lang.NoSuchMethodError: org.apache.spark.SparkContext$.$lessinit$greater$default$2()Lscala/collection/Map;" } } (error 500)
I've attempted to build my own apps following some of the spark-jobserver guides and have followed the second half of http://gethue.com/get-started-with-spark-deploy-spark-server-and-compute-pi-from-your-web-browser/ to clone the spark-jobobserver examples and build them.
I find people complaining about something similar on the issues for spark-jobserver (https://github.com/ooyala/spark-jobserver/issues/29). Those issues do seem pending still possibly...and the originator of the issue indicated it started failing when moving from spark 0.9.1 to 1.0, and the CDH 5.2 QuickStart VM I have DOES seem to have Spark 1.1.0, but I'm giving Cloudera the benefit of the doubt that they didn't do something like bump the version of Spark in this VM, making it incompatible with spark-jobserver which is used in Hue.
I would appreciate any guidance. I'd like to get something working so I can start playing around with kicking off jobs and getting results from an external app with HTTP to the spark-jobserver.
Thanks!
Created 12-07-2014 07:23 AM
Just to follow up, check out this thread (http://community.cloudera.com/t5/Apache-Hadoop-Concepts-and/spark-jobserver-and-spark-jobs-in-Hue-on... where I detail re-building the spark-jobserver and getting things to work. So, it does look like the problem I encountered was due to the CDH5.2 QuickStart VM having a version of the spark-jobserver that was compiled against Spark 0.9.1 causing the error I encountered due to incompatibilities with Spark 1.1.0.
Thanks,
Chris
Created 12-05-2014 06:10 AM
That's a weird one indeed. NoSuchMethodError generally means you have built your app against a different version of a library than you run against.
CDH 5.2 contains Spark 1.1.0 plus a few critical upstream fixes. None of those changes should ever affect source or binary compatibility. You can always build against the exact CDH 5.2 artifacts anyway, but it shouldn't matter. 1.1.0, 1.1.1, is all the same from an API perspective.
But, here, there should be no incompatibility at all anyway, since you should not be bundling Spark libs (or Hadoop libs) with your app. Are you marking them as 'provided' dependencies?
Created 12-05-2014 07:49 AM
Yep...I already tried building against the exact CDH 5.2 artifacts...from my pom.xml
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.1.0-cdh5.2.0</version> <scope>provided</scope> </dependency>
so I had to throw in the Cloudera repository in the pom.xml to reference that artifact. I checked the generated jar after packaging with maven and verified that the only thing that was included was my simple test class. I've modified my simple test class (Java) to try different approaches...1.) Just having a main() method setting up the spark context in such a way that it works fine with spark-submit, and 2.) changing the class to implement SparkJob so it should be runnable by the jobserver and be provided the SparkContext
Since I'm just using Java I'm not referencing any Scala libraries and not using the scala build tools...but just in case that was messing something up I setup the scala build tools and that is when I went to compile the spark-jobserver tests themselves and trying to run the spark.jobserver.WordCountExample. I didn't get a chance to verify which version of Scala those examples would be built against by default though...so its possible they were built against an older version.
Created 12-05-2014 07:52 AM
Yes, that's fine then. You do not need to build against the CDH artifacts, even. You do need to use spark-submit. It could be an issue with the jobserver? how are you deploying that, and is it consistent with your CDH installation?
Created 12-05-2014 07:55 AM
I'm just using the CDH5.2 quickstart vm. As far as I could see the spark-jobserver was already included and running. Maybe I'm mistaken? And why do I need to execute with spark-submit? I should just be able to navigate to the Spark Editor in Hue...upload my jar and execute it there, correct? (Under the covers it is of course doing something)
Created 12-05-2014 08:04 AM
Got it, spark-submit works, but you want to use the jobserver. This could be my ignorance but I thought you still had to build and install the jobserver yourself. Hue doesn't seem to have it in CDH 5.2 but I haven't looked at the VM in a while. Are you building jobserver yourself or no?
Created 12-05-2014 08:25 AM
I'm not building the spark-jobserver myself...however it certainly seems to be included in the VM in the /var/lib/cloudera-quickstart/spark-jobserver directory along with some nohup output indicating it is likely running
[cloudera@quickstart spark-jobserver]$ pwd /var/lib/cloudera-quickstart/spark-jobserver [cloudera@quickstart spark-jobserver]$ ls gc.out nohup.out settings.sh log4j-server.properties server_start.sh spark-job-server.jar
There is also .pid file indicating the spark-jobserver is running which the pid matching up with jps...so it certainly seems to be running:
[cloudera@quickstart cloudera-quickstart]$ tail spark-jobserver.pid 1869 [cloudera@quickstart cloudera-quickstart]$ sudo jps | grep JobServer 1869 JobServer
Created 12-05-2014 12:27 PM
Hm, I wonder if the jobserver just needs to be updated in the VM. You could try building and running your own. I've not used the jobserver myself. Someone else may have more insight or it may be a good question for the VM forum.
Created 12-05-2014 06:18 PM
It's quite possible it needs to be updated in the VM. It's included in the VM purely as a convenience - as it's not an officially supported or included part of CDH it doesn't go through all the same testing as everything else. If it does need to be updated, I can look at doing that for the next release and will post back here when it is.
Created 12-07-2014 07:23 AM
Just to follow up, check out this thread (http://community.cloudera.com/t5/Apache-Hadoop-Concepts-and/spark-jobserver-and-spark-jobs-in-Hue-on... where I detail re-building the spark-jobserver and getting things to work. So, it does look like the problem I encountered was due to the CDH5.2 QuickStart VM having a version of the spark-jobserver that was compiled against Spark 0.9.1 causing the error I encountered due to incompatibilities with Spark 1.1.0.
Thanks,
Chris