Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Managing and deploying Spark applications

Highlighted

Managing and deploying Spark applications

Explorer

Interested to hear what others are doing about deploying Spark applications to their clusters.

 

Currently I use Oozie to manage MapReduce / Hive workflows.  It's not perfect (far from it), but at least the Hue GUI offers a nice view of the workflow and clearly indicates when a stage has failed.

 

Am interested to hear what people are doing in Spark-land.  Currently I've got a Spark application running nightly.  I'm using Oozie to run a shell script that runs the Spark script with: spark-shell < myscript.scala

 

That's about as nasty as it gets.  I can think of a couple of alternatives:

 

  • Build my script into a jar.  Use Oozie / shell task to spark-submit it to the cluster.  That's not a whole lot better than the first case, but I'd probably get a more sensible return code which Oozie could test for (spark-shell always exists successfully, as you'd expect).
  • Write a Spark app with a long running driver that sleeps / loops.  That would let me monitor the application through the Spark Master GUI.  I'm not sure how many resources a long-running driver consumes - does it reserve memory for workers? 
  • A crontab and spark-submit. Easier to configure than Oozie, but with almost no 'free' monitoring available.

Is there an alternative?  What do others do?

 

Thanks

 

1 REPLY 1

Re: Managing and deploying Spark applications

Master Guru
If you use a more recent Oozie release, you can directly use the Spark action instead: http://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SparkActionExtension.html