Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

Solved Go to solution

Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

New Contributor

GroupID/ClientId :

I am reading from Kafka via Trident kafka spout (Opaque Transactional Spout), On Restart If I change the clientID (passed into Tridentkafka Config) I don't see that my spout reading data from initial data point

Is clientId same as groupid?

But If I change stream name, spout starts getting data from beginning.

https://github.com/apache/storm/blob/master/external/storm-kafka/src/jvm/org/apache/storm/kafka/trid...

@Ram Sriharsha

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

Guru

@Narendra Bidari clientId and groupId are not the same. ClientId is a user specified string value that is sent along with every message to help with tracing and debugging. On the other hand groupId is a unique identifier for a group of consumer processes. Since the Kafka read offset is stored in zookeeper for your groupId you don't start reading files from the beginning for that topic. This is why you are able to read the entire topic when you change the topic name because no previous offset has been stored hope this helps

7 REPLIES 7

Re: Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

Mentor
Highlighted

Re: Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

Guru

@Narendra Bidari clientId and groupId are not the same. ClientId is a user specified string value that is sent along with every message to help with tracing and debugging. On the other hand groupId is a unique identifier for a group of consumer processes. Since the Kafka read offset is stored in zookeeper for your groupId you don't start reading files from the beginning for that topic. This is why you are able to read the entire topic when you change the topic name because no previous offset has been stored hope this helps

Re: Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

New Contributor

@Jeremy Dyer : Thanks for the answer. I now understand clientId is not same as groupId.

I could not get the second part of the answer.

My Understanding : If we are consuming data from Kafka/zookeeper, it maintains an offset in zookeeper under some folders like transactional or consumers with group id

In tridentKafkaConfig, there is no option to specify groupId at all, is groupId same as StreamId, if so where is its offset saved in zookeeper?

I don't see any offset on the source Kafka/zookeeper, (In zookeeper folder /transactional).

Re: Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

Mentor

@Narendra Bidari has this been resolved? Can you post your solution or accept best answer?

Re: Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

New Contributor

@Narendra Bidari

for trident : It maintains offset in zookeeper folder [stream-name]

So I think stream name must be acting as consumer group id.

Re: Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

New Contributor

@Amber Kulkarni : yes you are correct stream name acts as stream id.

Re: Trident (Opaque Transactional Spout) : Difference between ClientID and STREAMName

New Contributor

@Narendra Bidari

This is about the second part of question about group-id. You can try setting a txnId value for spout with the below API which acts like a consumer group-id. This is used in maintaining opaque transactional spout's state in ZK.

Stream stream = TridentTopology#newStream(txnId, spout);