Created on 12-13-2017 10:46 PM - edited 09-16-2022 05:38 AM
I am trying to build my own tool(scripts) to monitor Kafka Backlog for spark streaming application with Kafka. I am using createDirectStream api of Spark, so I don't have any consumer created in Kafka. Because of this I am not able to monitor the backlog in kafka. Is there a way I can monitor kafka backlog in this approach?
Created 12-14-2017 11:25 AM
You have to do it from Kafka side. And from Kafka perspective, it does have consumers connected and it sees those consumers. You can figureout the backlog in two steps:
1) find out all the consumers connected:
kafka-run-class kafka.admin.ConsumerGroupCommand --bootstrap-server {kafka_host}:{port} --list --new-consumer
kafka-run-class kafka.admin.ConsumerGroupCommand --bootstrap-server {kafka_host}:{post} --describe --new-consumer --group <group_name>
Created on 12-14-2017 12:00 PM - edited 12-14-2017 12:02 PM
Thanks for the response Zhang. But, I don't have a consumer in kafka as I am using direct stream approach(no receiver). I tried specifying kafka properties in my spark application like "group.id", "consumer.id" but remained with no luck. They didn't show up in my kafka consumer list.
P.S: I am using old kafka conumer api.
Created 12-14-2017 12:28 PM
This is something I found on Stackoverflow..
Is there any other way of monitoring the backlog in this approach. I tried exploring Spark Metrics API. Even it is of no big use in this case.