I am trying to setup a baseline for performance of our topology. I started with 1 worker node and 1 worker processes
We have 1 Kafka Spout, 1 Processing Bolt and 1 HdfsBolt. I set the parallelism for all of them to 1.
Below are other storm configs. We are using 1 D14_V2 in Azure for our worker/supervisor node. 2 D4 v2 for Nimbus and 3 D12 v2 for Zookeeprs
I deployed the topology to the Storm cluster and pumped messages into Kafka topic. But KafkaSpout doesn't consume any of those messages. I did check to make sure the topic has messages.
In our KafkaSpout, we give a static list of broker nodes list. I tried telnet from the worker node to both Kafka brokers and zookeepers and both of them were successful in getting connections.
Does anyone have any insight into what could be wrong with KafkaSpout thats preventing it from consuming messages from kafka?
Here's a snippet from a working scenario similar to yours that works just fine for me.
TopologyBuilder builder = new TopologyBuilder(); BrokerHosts hosts = new ZkHosts("zk1:2181,zk2:2181,zk3:2181"); SpoutConfig sc = new SpoutConfig(hosts, "s20-logs", "/s20-logs", UUID.randomUUID().toString()); sc.scheme = new SchemeAsMultiScheme(new StringScheme()); KafkaSpout spout = new KafkaSpout(sc); builder.setSpout("log-spout", spout, 1);
If you notice, you'll see that my BrokerHosts are really the list of ZooKeeper instances. I'm running this on a HDP 2.5 cluster which is Storm 1.0.1 and I constructed this ZkHosts class after reviewing the notes in http://storm.apache.org/releases/1.0.1/storm-kafka.html.
Might be worth a try for you. Either way, good luck and Happy Hadooping/Storming!