Created on 01-18-2019 09:29 PM - edited 08-17-2019 04:58 AM
I need to parse Kerberos KDC Log files (including the currently filling file) to find users with their host that are connecting.
It seems using Grok in NiFi we can parse out a lot of different parts of these files and use them for filtering and alerting with ease.
This is what many of the lines in the log file look like:
Jan 01 03:31:01 somenewserver-310 krb5kdc[28593](info): AS_REQ (4 etypes {18 17 16 23}) 192.168.237.220: ISSUE: authtime 1546278185, etypes {rep=18 tkt=16 ses=18}, nn/somenewserver-310.field.hortonworks.com@HWX.COM for nn/somenewserver-310.field.hortonworks.com@HWX.COM
State of the Tail Processor
Tail a File
We also have the option of using the GrokReader listed in an article included to immediately convert matching records to output formats like JSON or Avro and then partition into groups. We'll do that in a later article.
In this one, we can get a line from the file via Tail, read a list of files and fetch one at a time or generate a flow file for testing. Once we had some data we'll start parsing into different message types. These messages can then be use for alerting, routing, permanent storage in Hive/Impala/HBase/Kudu/Druid/S3/Object Storage/etc...
In the next step we will do some routing and alerting. Follow up by some natural language processing (NLP), machine learning and then we'll use various tools to search, aggregate, query, catalog, report on and build dashboards from this type of log and others.
Example Output JSON Formatted
PREAUTH
{ "date" : "Jan 07 02:25:15", "etypes" : "2 etypes {23 16}", "MONTH" : "Jan", "HOUR" : "02", "emailhost" : "cloudera.net", "TIME" : "02:25:15", "pid" : "21546", "loghost" : "KDCHOST1", "kuser" : "krbtgt", "message" : "Additional pre-authentication required", "emailuser" : "user1", "MINUTE" : "25", "SECOND" : "15", "LOGLEVEL" : "info", "MONTHDAY" : "01", "apphost" : "APP_HOST1", "kuserhost" : "cloudera.net@cloudera.net" }
ISSUE
{ "date" : "Jan 01 03:20:09", "etypes" : "2 etypes {23 18}", "MONTH" : "Jan", "HOUR" : "03", "BASE10NUM" : "1546330809", "emailhost" : "cloudera.net", "TIME" : "03:20:09", "pid" : "24546", "loghost" : "KDCHOST1", "kuser" : "krbtgt", "message" : "", "emailuser" : "user1", "authtime" : "1546330809", "MINUTE" : "20", "SECOND" : "09", "etypes2" : "rep=23 tkt=18 ses=23", "LOGLEVEL" : "info", "MONTHDAY" : "01", "apphost" : "APP_HOST1", "kuserhost" : "cloudera.net@cloudera.net" }
Grok Expressions
For Parsing Failure Records
%{SYSLOGTIMESTAMP:date} %{HOSTNAME:loghost} krb5kdc\[%{POSINT:pid}\]\(%{LOGLEVEL}\): %{GREEDYDATA:premessage}failure%{GREEDYDATA:postmessage}
For Parsing PREAUTH Records
%{SYSLOGTIMESTAMP:date} %{HOSTNAME:loghost} krb5kdc\[%{POSINT:pid}\]\(%{LOGLEVEL}\): AS_REQ \(%{GREEDYDATA:etypes}\) %{GREEDYDATA:apphost}: NEEDED_PREAUTH: %{USERNAME:emailuser}@%{HOSTNAME:emailhost} for %{GREEDYDATA:kuser}/%{GREEDYDATA:kuserhost}, %{GREEDYDATA:message}
For Parsing ISSUE Records
%{SYSLOGTIMESTAMP:date} %{HOSTNAME:loghost} krb5kdc\[%{POSINT:pid}\]\(%{LOGLEVEL}\): AS_REQ \(%{GREEDYDATA:etypes}\) %{GREEDYDATA:apphost}: ISSUE: authtime %{NUMBER:authtime}, etypes \{%{GREEDYDATA:etypes2}\}, %{USERNAME:emailuser}@%{HOSTNAME:emailhost} for %{GREEDYDATA:kuser}/%{GREEDYDATA:kuserhost}%{GREEDYDATA:message}
Resources:
For Testing Grok Against Your Files
http://grokdebug.herokuapp.com/
A Great Article on Using GrokReader for Record Oriented Processing
More About Grok
https://datahovel.com/2018/07/
http://grokconstructor.appspot.com/do/automatic?example=0
https://gist.github.com/acobaugh/5aecffbaaa593d80022b3534e5363a2d