Support Questions

Find answers, ask questions, and share your expertise

Read from sharded rabbitmq and write to partitioned kafka with only once semantics.

avatar

Problem : Read from sharded rabbitmq and write to kafka with only once semantics.

==> I am planning to use Trident for the same.

  • I have used RabbitMq spout (storm) which has implemented BaseRichSpout (https://github.com/ppat/storm-rabbitmq)
    1. I can see that it is a non transactional spout, and hence "would not" support only once semantics of Trident. Am I correct?
  • If I need to build an OpaqueTransactionalSpout for Rabbitmq, Can you give me some hints on how to start or some examples to look into
  • I wanted to write unit test to check if my Trident Topology supports "Exactly once semantic". Are there any sample Unit tests/examples. The reason I want to write the tests because "Exactly once semantics" depend on Spout also not just on Trident or Storm with my understanding.
  • I have to support sharded Rabbitmq and hence how to connect to more than one queue. I am thinking of using Stream.merge(List<Streams>) after creating once stream for each queue, Is this good idea to do?. Let me know your thoughts on the same.
  • https://nathanmarz.github.io/storm/doc/storm/tride...
  • @Ram Sriharsha

    1 ACCEPTED SOLUTION

    avatar
    Explorer
    • Storm does not support the storm-rabbitmq spout that you mentioning, it does not ship with apache storm distro so I have not looked at the code.
    • Here is a JMS spout that you may be able to use as is with RabitMQ if RAbitMQ supports JMS https://github.com/hortonworks/storm/tree/2.3-mai... (Note that this is available in HWX repo but not under apache)
    • For an example of OpaqueSpout take a look at kafka-spout. https://github.com/apache/storm/blob/master/extern...
    • The unit test part for exactly once is kind of hard in my opinion and there are no current examples.
    • I am not sure if I understand the sharing queue's question completely. Your spout could read from N queues while outputting all the messages to a single stream, that is completely up to your spout. The merge can be useful if you want to create one spout -> one queue mapping and then merge all the output streams into a single stream that other processing units (trident states) subscribe to.

    View solution in original post

    2 REPLIES 2

    avatar
    Explorer
    • Storm does not support the storm-rabbitmq spout that you mentioning, it does not ship with apache storm distro so I have not looked at the code.
    • Here is a JMS spout that you may be able to use as is with RabitMQ if RAbitMQ supports JMS https://github.com/hortonworks/storm/tree/2.3-mai... (Note that this is available in HWX repo but not under apache)
    • For an example of OpaqueSpout take a look at kafka-spout. https://github.com/apache/storm/blob/master/extern...
    • The unit test part for exactly once is kind of hard in my opinion and there are no current examples.
    • I am not sure if I understand the sharing queue's question completely. Your spout could read from N queues while outputting all the messages to a single stream, that is completely up to your spout. The merge can be useful if you want to create one spout -> one queue mapping and then merge all the output streams into a single stream that other processing units (trident states) subscribe to.

    avatar
    Explorer
    • Out of the box Rabbitmq can support a single delivery of a message (Exactly Once), it really depends on how the RabbitMQ routing topology has been created. For instance, if you have a single exchange and (direct) queue then it's impossible for another queue to receive the message.
    • Instead merging streams, you may want to use RabbitMQ to do this, of course it depends on your RabbitMQ topology once again.