Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Storm - missing messages in pipeline

avatar
Explorer

Hi all..We are noticing that there are some messages which get lost during storm processing..below is a brief outline of our pipeline.

13712-smouw.png

We have messages coming to Kafka which then get consumed by 2 different kafka spouts in Storm. One Spout writes the message to raw stream and other storm starts processing the message. We need to store the output of Bolt2 to HDFS and also send it down for further processing which will then eventually end up in ADLS as well.

All the 3 HDFS bolts are configured to write to different folder structures in ADLS. In an ideal scenario I should see all the 3 messages in ADLS ( raw, out of bolt2 and output of bolt3). But we are noticing that raw gets written always but sometimes only one of the output (bolt2 or bolt3) gets written to ADLS. Its inconsistent on which one misses. Sometimes both get written. There aren't any errors/exceptions in log messages.

Did anyone run into such issues? Any insight will be appreciated. Are there any good monitoring tools other than Storm UI that gives insight into what is going on? We are using HDInsight and are hosted on Azure and are using Storm 1.0.1

Thanks.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Laxmi Chary thanks for your question. Do you know if there's ever a case where Message from Bolt 2 doesn't get written but from Bolt 3 does get written? Are you anchoring tuples in your topology? collector.emit(tuple, new Field()) [the tuple is the anchor]

Are you doing any microbatching in your topology?

View solution in original post

11 REPLIES 11

avatar
Expert Contributor

@Shravanthi please accept the answer if this solved your issue.

avatar
Explorer

@Ambud Sharma we are testing this change and will accept once we are done. I am still not 100% convinced that this solves the problem since the Storm documentation says BasicBolt does the acking and anchoring http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html

Search for BasicBolt in that link and you will find "Storm has an interface called BasicBolt that encapsulates this pattern for you."