Reply
Explorer
Posts: 22
Registered: ‎03-04-2014

Managing and deploying Spark applications

Interested to hear what others are doing about deploying Spark applications to their clusters.

 

Currently I use Oozie to manage MapReduce / Hive workflows.  It's not perfect (far from it), but at least the Hue GUI offers a nice view of the workflow and clearly indicates when a stage has failed.

 

Am interested to hear what people are doing in Spark-land.  Currently I've got a Spark application running nightly.  I'm using Oozie to run a shell script that runs the Spark script with: spark-shell < myscript.scala

 

That's about as nasty as it gets.  I can think of a couple of alternatives:

 

  • Build my script into a jar.  Use Oozie / shell task to spark-submit it to the cluster.  That's not a whole lot better than the first case, but I'd probably get a more sensible return code which Oozie could test for (spark-shell always exists successfully, as you'd expect).
  • Write a Spark app with a long running driver that sleeps / loops.  That would let me monitor the application through the Spark Master GUI.  I'm not sure how many resources a long-running driver consumes - does it reserve memory for workers? 
  • A crontab and spark-submit. Easier to configure than Oozie, but with almost no 'free' monitoring available.

Is there an alternative?  What do others do?

 

Thanks

 

Highlighted
Posts: 1,885
Kudos: 425
Solutions: 300
Registered: ‎07-31-2013

Re: Managing and deploying Spark applications

If you use a more recent Oozie release, you can directly use the Spark action instead: http://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SparkActionExtension.html