I have a question about Storm Tridents exactly-once semantics and how it would behave in the following scenario:
Suppose I have a topology that has 3 outputs to sink to; Kafka topic, Hbase table and a HDFSBolt. When a Trident batch is written to Kafka and HBase you can have strong guarantees that the writes are actually ack'ed or not. But for writes to HDFS you don't have that.
So do HDFSBolts boast the very same strong exactly-once guarantee? What would/could be scenario's that result in Trident batches being written twice to HDFS? Or is this a negligible risk? I need to know if there is any reason to built deduplication logic based on the data that lands in HDFS via the Storm Bolt.