Support Questions

Find answers, ask questions, and share your expertise

Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

avatar

GroupID/ClientId :

I am reading from Kafka via Trident kafka spout (Opaque Transactional Spout), On Restart If I change the clientID (passed into Tridentkafka Config) I don't see that my spout reading data from initial data point

Is clientId same as groupid?

But If I change stream name, spout starts getting data from beginning.

https://github.com/apache/storm/blob/master/external/storm-kafka/src/jvm/org/apache/storm/kafka/trid...

@Ram Sriharsha

1 ACCEPTED SOLUTION

avatar
Guru

@Narendra Bidari clientId and groupId are not the same. ClientId is a user specified string value that is sent along with every message to help with tracing and debugging. On the other hand groupId is a unique identifier for a group of consumer processes. Since the Kafka read offset is stored in zookeeper for your groupId you don't start reading files from the beginning for that topic. This is why you are able to read the entire topic when you change the topic name because no previous offset has been stored hope this helps

View solution in original post

7 REPLIES 7

avatar
Master Mentor

avatar
Guru

@Narendra Bidari clientId and groupId are not the same. ClientId is a user specified string value that is sent along with every message to help with tracing and debugging. On the other hand groupId is a unique identifier for a group of consumer processes. Since the Kafka read offset is stored in zookeeper for your groupId you don't start reading files from the beginning for that topic. This is why you are able to read the entire topic when you change the topic name because no previous offset has been stored hope this helps

avatar

@Jeremy Dyer : Thanks for the answer. I now understand clientId is not same as groupId.

I could not get the second part of the answer.

My Understanding : If we are consuming data from Kafka/zookeeper, it maintains an offset in zookeeper under some folders like transactional or consumers with group id

In tridentKafkaConfig, there is no option to specify groupId at all, is groupId same as StreamId, if so where is its offset saved in zookeeper?

I don't see any offset on the source Kafka/zookeeper, (In zookeeper folder /transactional).

avatar
Master Mentor

@Narendra Bidari has this been resolved? Can you post your solution or accept best answer?

avatar

@Narendra Bidari

for trident : It maintains offset in zookeeper folder [stream-name]

So I think stream name must be acting as consumer group id.

avatar

@Amber Kulkarni : yes you are correct stream name acts as stream id.

avatar
Contributor

@Narendra Bidari

This is about the second part of question about group-id. You can try setting a txnId value for spout with the below API which acts like a consumer group-id. This is used in maintaining opaque transactional spout's state in ZK.

Stream stream = TridentTopology#newStream(txnId, spout);