Support Questions
Find answers, ask questions, and share your expertise

Enrich syslog using NIFI

New Contributor

Hello,

I am trying to enrich syslog data at scale with NIFI. I would like to do a lookup based on a source IP address in the syslog message against a table to retrieve a hostname.

The lookup could either be a DNS query or I could maintain a file mapping. The key thing is that this needs to be a scalable solution across millions of messages a day. It does not seem feasible to do a DNS query for every single syslog message based on each syslog message unless there was some caching done. That doesn't seem to be a part of the processor.

I also considered using ReplaceTextWithMapping and maintain a table myself. But this doesn't seem to work because it will not even match correctly using a simple regex and also it seems you can only replace directly the text you match. You cannot easily insert into another part of the message.

Is there an approach I am missing based on my lack of understanding?

7 REPLIES 7

Contributor

Hi @Omid Krabbe

You could point the NiFi processor at a caching dns server that is pointing at the upstream nameservers you want to use.[1]

This would allow you to benefit from caching without needing to implement implement it yourself. If your volume is high enough, you could even have a dnsmasq setup on each node and have each instance use its local dnsmasq.

Another alternative would be to use the PutDistributedMapCache[2] and FetchDistributedMapCache[3] processors to cache the lookups.

Thanks,

Bryan

[1] https://www.g-loaded.eu/2010/09/18/caching-nameserver-using-dnsmasq/

[2] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutDistributed...

[3] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.FetchDistribut...

New Contributor

Thanks for the answer Bryan. How big would a cluster have to be to handle say 200 million events? I was just thinking if it were possible to maintain some sort of in memory cache on each instance it would be dramatically more efficient.

Contributor

@Omid Krabbe

Updated the answer with a caching solution, please let me know if that's what you were looking for

New Contributor

thanks @brosander, I will research these. Ok if I leave this open in case someone else has solved this before int he past?

Contributor

@Omid Krabbe sure, that's fine 🙂

In addition to the processors pointed out by @brosander, there is also the QueryDNS processor[1] which may be of use.

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.enrich.QueryDNS/index.h...

New Contributor

Thanks @jpercivall, I did see that processor but I am skeptical that can work at scale without some serious infrastructure to support it.