I have ambari managed 10 node hdp cluster ( 184.108.40.206 ). I am trying to index my netflow log ( 1 million records ) file using csv parser. Since my cluster nodes are lesser RAM 8 GB, I have controlled indexing traffic using spout.maxUncommittedOffsets - 100000 & topology_max_spout_pending - 100 to avoid resource contention in my storm workers.
My storm configurations :
Number of kafka partition :5
Number of workers : 5
Storm supervisors : 5
Number of executors : 5
When I try to indexing netflow log file using metron sensor ( CSV parser ) It hangs when it reaches around .5 million. If I reduce the maxUncommittedOffsets to 10000 hanging comes near to 50k. So it seems it is depending on the maxuncommittedoffsets. Each time this issue happens if I delete kafka topics and restart storm and metron things will start work again and hangs if it reaches the same level. When I monitor the enrichment & indexing input topics using kafka consumer messages stop coming to main sensor input topic.
I have disabled both solr & elastic indexing using indexing configuration in metron UI. Could this be effecting offsetcommit ?
What happens in storm topology if messages reaches spout.maxUncommittedOffsets ? Does this check the committed value from zookeeper ?
I had tried lot of times and when this issue comes I see the below log entry in my worker log continuously could this be the problem ?
topic-partition [netflow-1] has unexpected offset . Current committed Offset