New Contributor
Posts: 3
Registered: ‎03-07-2016

Spark Streaming application shutdown when Kafka brokers are restarted

I have spark streaming application that uses direct streaming listening to KAFKA topic.

    HashMap<String, String> kafkaParams = new HashMap<String, String>();
    kafkaParams.put("", "broker1,broker2,broker3");
    kafkaParams.put("auto.offset.reset", "largest");

    HashSet<String> topicsSet = new HashSet<String>();

    JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(

I notice when i stop/shutdown kafka brokers, my spark application also shutdown.

Here is the spark execution script

spark-submit \
--master yarn-cluster \
--files /home/siddiquf/spark/log4j-spark.xml
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.xml" \
--class com.example.MyDataStreamProcessor \

Spark job submitted successfully and i can track the application driver and worker/executor nodes.

Everything works fine but only concern if kafka borkers are offline or restarted my application controlled by yarn should not shutdown? but it does.

If this is expected behavior then how to handle such situation with least maintenance? Keeping in mind Kafka cluster is not in hadoop cluster and managed by different team that is why requires our application to be resilient enough.


Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Spark Streaming application shutdown when Kafka brokers are restarted

If the brokers are entirely unavailable, I believe it would fail the
job, yes. The job can't read any data anyway, so it can't proceed
anyway. I think you'd just have to restart the job if your cluster
were offline completely (or try to negotiate more reliability from the
other team if that's really the problem?)