I have some spark-streaming applications that ingest data from Kafka and insert into ElasticSearch and Cassandra.
Currently I need to deploy those applications on YARN so they are running 24/7.
What is the best tool for this task? I have been pondering Oozie, but it doesn't feel like a tool designed to deploy jobs which will be running 24/7, is there any other thing I could use? Are spark jobs the best option to move data from Kafka to other system or should I consider using Flume?
since you are ingesting a data from kafka you must be using spark-streaming which is never ending job until you won't stop it so I dont think you need to run it from some scheduler.
This was a mistake from my side, I am using spark-streaming for this. (Will edit the original question) I need something that ensures this job is re-submitted automatically if it dies for some reason. Any idea about which tool could help with this?