Support Questions

Find answers, ask questions, and share your expertise

Are Failed Messages a Storm Spout Reports Lost?

avatar
Super Collaborator

If a Storm Spourt reports X number of messages as failed, are these messages lost and were not replayed? So these messages were never processed by the toplogoy? How to find out which messages were effected?

1 ACCEPTED SOLUTION

avatar

It depends on how the spout is implemented. Lets look at KafkaSpout the failed messages can be for 2 reasons

1. Downstream bolts are failed to process or called collector.fail

2. Downstream bolts are failed to acknowledge the tuple within topology.message.timeout.secs this is 30secs by default.

In two cases you can see the spout failed number go up. But kafkaSpout will replay until the messages are acknowledged . If you are using Acking with storm-core topology it guarantees at least once delivery, i.e there might be duplicates but no message loss.

View solution in original post

3 REPLIES 3

avatar

It depends on how the spout is implemented. Lets look at KafkaSpout the failed messages can be for 2 reasons

1. Downstream bolts are failed to process or called collector.fail

2. Downstream bolts are failed to acknowledge the tuple within topology.message.timeout.secs this is 30secs by default.

In two cases you can see the spout failed number go up. But kafkaSpout will replay until the messages are acknowledged . If you are using Acking with storm-core topology it guarantees at least once delivery, i.e there might be duplicates but no message loss.

avatar
Expert Contributor

Storm platform assumes that all messages will be processed successfully. This implies that any messages marked as "failed" were also replayed and successfully processed. I don't believe that failed and replayed messages are identified in a unique way by the platform.

Documentation reference

"This means the (failed) message is not actually taken off the queue yet, but instead placed in a "pending" state waiting for acknowledgement that the message is completed. While in the pending state, a message will not be sent to other consumers of the queue."

avatar

In storm version apache-storm-1.0.2 , there a field called maxRetries in the retryservice provided for the kafkaSpout and when a bolt fails for > maxretries the message is acked by spout.So does this mean the message them is removed from the kafka queue. If so, with out processing a message we ae actually removing it? Isn't that bad?