I am using metron framework to build an application. We have to work on real-time data. We are writing data to ES and HDFS. We are facing difficulty when data goes through indexing and its acknowledge count reaches 10000000 mark. This happens every time it reaches this mark.
PFB the indexing configurations,
# workers: 1
# executors : 5
# tasks : 5
replication count : 1
assigned memory : 4160
My question is, Is this happening because of any default limit defined? If yes, can we change that limit to infinity?
Also, does this have any relation to the memory assigned?
Can anyone please help me on this?
The short answer is, no there is no related setting to limit the indexing topology to any number.
I think you have to find the cause somewhere else:
-Check the size of the source Kafka topic (indexing)
-Check the setting for 'first poll offset strategy' for the indexing topology (is it starting from the beginning each time with EARLIEST ?)
-Check the indexing configuration on Zookeeper
-Are there any files on the HDFS indexing location, how many records were written there?
It could be that the setting at Ambari > Metron > Indexing > "Indexing Max Pending" would limit indexing output. Is that one set to 10.000.000 by any chance?
This setting limits the total number of in-flight tuples (read/consumed from Kafka input topic but not acked yet) for the topology. In this case, the tuples are not acked for 1 or both outputs (HDFS / ES) and thus not marked as 'no longer in-flight'.