Reply
Explorer
Posts: 14
Registered: ‎04-26-2017

Monitoring Kafka backlog for spark streaming application

[ Edited ]

I am trying to build my own tool(scripts) to monitor Kafka Backlog for spark streaming application with Kafka. I am using createDirectStream api of Spark, so I don't have any consumer created in Kafka. Because of this I am not able to monitor the backlog in kafka. Is there a way I can monitor kafka backlog in this approach?

Cloudera Employee
Posts: 53
Registered: ‎03-01-2016

Re: Monitoring Kafka backlog for spark streaming application

You have to do it from Kafka side. And from Kafka perspective, it does have consumers connected and it sees those consumers. You can figureout the backlog in two steps:

 

1) find out all the consumers connected:

 
kafka-run-class kafka.admin.ConsumerGroupCommand --bootstrap-server {kafka_host}:{port} --list --new-consumer
configure out which consumer (group name) is used by your spark application. 
2) then use the following code to monitor the backlog (the lag ):
 
 kafka-run-class kafka.admin.ConsumerGroupCommand --bootstrap-server {kafka_host}:{post} --describe --new-consumer --group <group_name>
 
Note, I assume your spark application is using the new kafka consumer API. Otherwise, you have to use the "--zookeeper" option instead of "--bootstrap-server", and remove "--new-consumer" in above commands.
Explorer
Posts: 14
Registered: ‎04-26-2017

Re: Monitoring Kafka backlog for spark streaming application

[ Edited ]

Thanks for the response Zhang. But, I don't have a consumer in kafka as I am using direct stream approach(no receiver). I tried specifying kafka properties in my spark application like "group.id", "consumer.id" but remained with no luck. They didn't show up in my kafka consumer list.

 

P.S: I am using old kafka conumer api.

Highlighted
Explorer
Posts: 14
Registered: ‎04-26-2017

Re: Monitoring Kafka backlog for spark streaming application

This is something I found on Stackoverflow..

 

https://stackoverflow.com/questions/36508553/how-to-specify-consumer-group-in-kafka-spark-streaming-....

 

Is there any other way of monitoring the backlog in this approach. I tried exploring Spark Metrics API. Even it is of no big use in this case.

Announcements
New solutions