Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How do I verify all of my data from enrichment made it to indexing and ES?

avatar
Expert Contributor

Hi,

I have a metron cluster running with data feeds coming in. I was wondering is there a way to verify x number of documents come through kafka for a topic, then parsing, enrichment, and then indexing to ES? The reason why I'm asking is because I don't see as many documents indexed on yesterday and today (for instance) than usually everyday.

Thanks in advance for feedback.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi Arian,

If you want to get serious about the volumes of processed feeds it is time to start tracking the Kafka offset growth

/usr/hdp/<VERSION>/kafka/bin/kafka-consumer-offset-checker.sh --zookeeper zk_host:2181 --security-protocol SASL_PLAINTEXT --topic parsing --group parsing

/usr/hdp/<VERSION>/kafka/bin/kafka-consumer-offset-checker.sh --zookeeper zk_host:2181 --security-protocol SASL_PLAINTEXT --topic enrichments --group enrichments

/usr/hdp/<VERSION>/kafka/bin/kafka-consumer-offset-checker.sh --zookeeper zk_host:2181 --security-protocol SASL_PLAINTEXT --topic indexing --group indexing

Just check the growth of the topics in between some test runs (if all topologies run continuously it will be tricky to squeeze out the exact numbers)

In Elastic make sure you set your queries right (what date_stamp is used for index time) to make a fair comparison. Errors while parsing, enrichment or indexing can also explain for some gaps, depending on where you direct those in your Metron config.

The Storm UI's numbers are not that easy to get right so don't waste too much time on those 🙂

View solution in original post

9 REPLIES 9

avatar
Super Collaborator

Hi Arian,

If you want to get serious about the volumes of processed feeds it is time to start tracking the Kafka offset growth

/usr/hdp/<VERSION>/kafka/bin/kafka-consumer-offset-checker.sh --zookeeper zk_host:2181 --security-protocol SASL_PLAINTEXT --topic parsing --group parsing

/usr/hdp/<VERSION>/kafka/bin/kafka-consumer-offset-checker.sh --zookeeper zk_host:2181 --security-protocol SASL_PLAINTEXT --topic enrichments --group enrichments

/usr/hdp/<VERSION>/kafka/bin/kafka-consumer-offset-checker.sh --zookeeper zk_host:2181 --security-protocol SASL_PLAINTEXT --topic indexing --group indexing

Just check the growth of the topics in between some test runs (if all topologies run continuously it will be tricky to squeeze out the exact numbers)

In Elastic make sure you set your queries right (what date_stamp is used for index time) to make a fair comparison. Errors while parsing, enrichment or indexing can also explain for some gaps, depending on where you direct those in your Metron config.

The Storm UI's numbers are not that easy to get right so don't waste too much time on those 🙂

avatar
Expert Contributor

Thank you very much for your response @Jasper

When I tried to run kafka-consumer-offset-checker, it said that it's deprecated and I don't have Jaas configuration in place. A quick look up it's something about single sign on kerberos account? I don't know if I need that but I'll look into it and find out. will be back for more update, thank you!

avatar
Super Collaborator

@Arian Trayen just ignore the deprecation message. Kafka project wants to deprecate but the replacement is not complete yet. Anyway you can just ignore that.

If you don't have Kerberos leave out the "--security-protocol SASL_PLAINTEXT" part

avatar
Expert Contributor

Thank you so much @Jasper

This is pretty neat! It provides offset, logsize and Lag information. I'm fairly new to Kafka as well and it looks like i'm very behind on indexing. However, from Storm UI, it didn't look like I have that many incoming for indexing topic. I was reading and they say Lag should be close to 0, which would indicate that the system is caught up. How do I get Lag down to 0? Do I need more indexing storm workers?

Offset: 82387393

logSize: 326704262

Lag: 244316869

As always, thank you for your time and response.

avatar
Super Collaborator

Look out for any error in the indexing topology logger at

/var/log/storm/worker-artifacts/<indexing-####-1234566>/6701/worker.log	

(replace "indexing-#####-1234566" for the real id of your current indexing topo by consulting the Storm UI.)

This will probably reveal the problem. If not you could also run the indexing topology in DEBUG mode for a while (also via StormUI )

avatar
Expert Contributor

Thank you @Jasper

I noticed I kept getting the error about fetching an offset out of range. I changed the kafka log retention rule to be shorter b/c I kept getting Out of Space because of pcap ingestion and kafka-log for pcap took all my space. Since I stop ingesting pcap, I reverted back the kafka retention rule and hopefully it won't complain about trying to read an offset that is already wiped out. If this doesn't work, I'll try the DEBUG mode that you suggested. Thank you again for your help!

Fetch offset 82387394 is out of range for partition indexing-0, resetting offset

avatar
Super Collaborator

@Arian Trayen

Normally you can overcome this thing by stopping the topology, change the first poll strategy, for 1 restart only, for the topo on Ambari to "EARLIEST" (or "LATEST" if you want) and restart the topology.

(don't forget to put it back to UNCOMMITTED_EARLIEST once the "out-of-range" errors have gone)

avatar
Expert Contributor

Thank you very much @Jasper

I was able to do that for the indexing topology, but how do you set that for parsing and enrichment topology?

I still see a large number of failed under indexing topology but nothing obvious in logs. Occasionally, I see kafka coordinate mark and discover dead topic and I don't know how to fix that, but it goes away after awhile

avatar
Expert Contributor

FYI...I didn't know the name of some of my kafka consumer groups and I figured out a way to list them all and describe each of them. Hopefully, it would help someone like myself

/usr/hdp/<VERSION>/kafka/bin/kafka-consumer-groups.sh --list --zookeeper <zk_host:port> | while read group; do echo $group; /usr/hdp/<VERSION>/kafka/bin/kafka-consumer-groups.sh --zookeeper <zk_host:port> --describe -group ${group}; done