I was recently working with a customer to deploy a Storm topology in their dev environment. I had built and tested the topology in my environment but when we went to deploy it at the customer site, nothing happened - literally.
We could see the topology was in fact running but it wasn’t consuming any data from the Kafka topic to which it was subscribing. We verified that data was in fact being written there using the kafka-console-consumer.sh utility but still the topology wasn’t doing anything and even more confusing, the storm log for the topology didn’t show anything beyond the deployment steps.
Going into the Storm UI we wanted to drill down to get the specifics of the topology and potentially turn on debug logging to see what was up, however when we clicked on the topology the Storm UI is unresponsive trying to load the topology summary. Even more confusing was the fact that the Storm UI didn’t show any error messages.
Turns out that at some point in the past in this development environment, the originally installed Kafka brokers had been removed, then later reinstalled. The new brokers are given new broker IDs by the installer but the metadata stored in ZK regarding the customer offsets, still had pointers to the old broker IDs.
Running ./kafka-topics.sh --describe on the queue in question showed that it was referencing the newer broker IDs but running the same command on the __consumer_offsets queue showed pointers to the old IDs. The all important __consumer_offsets queue is where consumers store the offset of the last record they read. This queue is automatically created by the system. In our case, ZK still stored the metadata to the old broker IDs when this topic was initially. Because our clients (both the client running in the topology and the Storm UI client) were getting the wrong info for the broker IDs, they were unable to run and manifested as the behavior we were seeing. To resolve this we need to go into ZK and delete this metadata.
Run the zkCli.sh utility from any ZK client machine in your cluster. Once your in delete the metadata for the __consumer_offsets topic by running the following command
Exit out of ZK then restart your Kafka brokers.
After removing this info and restarting, we re-launched our Storm topology and navigated to this in our Storm UI. The hung state of the UI was resolved and we saw that our Kafka spout was now successfully consuming data.
Here are some other additional links for more information on offsets and the __consumer_offsets queue. Good luck!