Support Questions
Find answers, ask questions, and share your expertise

Best tool to deploy Spark

Highlighted

Best tool to deploy Spark

I have some spark-streaming applications that ingest data from Kafka and insert into ElasticSearch and Cassandra.

Currently I need to deploy those applications on YARN so they are running 24/7.

What is the best tool for this task? I have been pondering Oozie, but it doesn't feel like a tool designed to deploy jobs which will be running 24/7, is there any other thing I could use? Are spark jobs the best option to move data from Kafka to other system or should I consider using Flume?

3 REPLIES 3

Re: Best tool to deploy Spark

since you are ingesting a data from kafka you must be using spark-streaming which is never ending job until you won't stop it so I dont think you need to run it from some scheduler.

Highlighted

Re: Best tool to deploy Spark

This was a mistake from my side, I am using spark-streaming for this. (Will edit the original question) I need something that ensures this job is re-submitted automatically if it dies for some reason. Any idea about which tool could help with this?

Highlighted

Re: Best tool to deploy Spark

@Jose Luis Navarro Vicente you can try https://github.com/spark-jobserver/spark-jobserver, once submitted check the status of job and if it got killed resubmit it.