Created 11-28-2016 09:37 AM
I have some spark-streaming applications that ingest data from Kafka and insert into ElasticSearch and Cassandra.
Currently I need to deploy those applications on YARN so they are running 24/7.
What is the best tool for this task? I have been pondering Oozie, but it doesn't feel like a tool designed to deploy jobs which will be running 24/7, is there any other thing I could use? Are spark jobs the best option to move data from Kafka to other system or should I consider using Flume?
Created 11-28-2016 09:48 AM
since you are ingesting a data from kafka you must be using spark-streaming which is never ending job until you won't stop it so I dont think you need to run it from some scheduler.
Created 11-28-2016 09:51 AM
This was a mistake from my side, I am using spark-streaming for this. (Will edit the original question) I need something that ensures this job is re-submitted automatically if it dies for some reason. Any idea about which tool could help with this?
Created 11-28-2016 09:56 AM
@Jose Luis Navarro Vicente you can try https://github.com/spark-jobserver/spark-jobserver, once submitted check the status of job and if it got killed resubmit it.