Created 01-14-2016 07:22 PM
GroupID/ClientId :
I am reading from Kafka via Trident kafka spout (Opaque Transactional Spout), On Restart If I change the clientID (passed into Tridentkafka Config) I don't see that my spout reading data from initial data point
Is clientId same as groupid?
But If I change stream name, spout starts getting data from beginning.
Created 01-16-2016 12:40 PM
@Narendra Bidari clientId and groupId are not the same. ClientId is a user specified string value that is sent along with every message to help with tracing and debugging. On the other hand groupId is a unique identifier for a group of consumer processes. Since the Kafka read offset is stored in zookeeper for your groupId you don't start reading files from the beginning for that topic. This is why you are able to read the entire topic when you change the topic name because no previous offset has been stored hope this helps
Created 01-15-2016 04:31 AM
Created 01-16-2016 12:40 PM
@Narendra Bidari clientId and groupId are not the same. ClientId is a user specified string value that is sent along with every message to help with tracing and debugging. On the other hand groupId is a unique identifier for a group of consumer processes. Since the Kafka read offset is stored in zookeeper for your groupId you don't start reading files from the beginning for that topic. This is why you are able to read the entire topic when you change the topic name because no previous offset has been stored hope this helps
Created 01-18-2016 06:50 PM
@Jeremy Dyer : Thanks for the answer. I now understand clientId is not same as groupId.
I could not get the second part of the answer.
My Understanding : If we are consuming data from Kafka/zookeeper, it maintains an offset in zookeeper under some folders like transactional or consumers with group id
In tridentKafkaConfig, there is no option to specify groupId at all, is groupId same as StreamId, if so where is its offset saved in zookeeper?
I don't see any offset on the source Kafka/zookeeper, (In zookeeper folder /transactional).
Created 02-02-2016 03:06 PM
@Narendra Bidari has this been resolved? Can you post your solution or accept best answer?
Created 07-27-2016 11:25 AM
for trident : It maintains offset in zookeeper folder [stream-name]
So I think stream name must be acting as consumer group id.
Created 07-27-2016 02:35 PM
@Amber Kulkarni : yes you are correct stream name acts as stream id.
Created 07-28-2016 05:55 AM
This is about the second part of question about group-id. You can try setting a txnId value for spout with the below API which acts like a consumer group-id. This is used in maintaining opaque transactional spout's state in ZK.
Stream stream = TridentTopology#newStream(txnId, spout);