Created 05-09-2018 02:19 PM
I have ambari managed 10 node hdp cluster ( 2.5.3.0 ). I am trying to index my netflow log ( 1 million records ) file using csv parser. Since my cluster nodes are lesser RAM 8 GB, I have controlled indexing traffic using spout.maxUncommittedOffsets - 100000 & topology_max_spout_pending - 100 to avoid resource contention in my storm workers.
My storm configurations :
Number of kafka partition :5
Number of workers : 5
Storm supervisors : 5
Number of executors : 5
When I try to indexing netflow log file using metron sensor ( CSV parser ) It hangs when it reaches around .5 million. If I reduce the maxUncommittedOffsets to 10000 hanging comes near to 50k. So it seems it is depending on the maxuncommittedoffsets. Each time this issue happens if I delete kafka topics and restart storm and metron things will start work again and hangs if it reaches the same level. When I monitor the enrichment & indexing input topics using kafka consumer messages stop coming to main sensor input topic. I have disabled both solr & elastic indexing using indexing configuration in metron UI. Could this be effecting offsetcommit ? What happens in storm topology if messages reaches spout.maxUncommittedOffsets ? Does this check the committed value from zookeeper ?
I had tried lot of times and when this issue comes I see the below log entry in my worker log continuously could this be the problem ?
topic-partition [netflow-1] has unexpected offset [1118]. Current committed Offset [400359]
Indexing config :
{ "hdfs": { "batchSize": 50, "enabled": true, "index": "netflow" }, "elasticsearch": { "batchSize": 1, "enabled": false, "index": "netflow" }, "solr": { "batchSize": 1, "enabled": false, "index": "netflow" } }
sput.config
{ "poll.timeout.ms": 100000, "session.timeout.ms": 39000, "max.poll.records": 2000, "spout.pollTimeoutMs": 20000, "spout.maxUncommittedOffsets": 100000, "spout.offsetCommitPeriodMs": 30000 }
Created 10-11-2018 06:38 PM
Hi there.
Were you able to address the issue described here?