Community Articles

TimothySpann · ‎01-18-2019

I need to parse Kerberos KDC Log files (including the currently filling file) to find users with their host that are connecting.

It seems using Grok in NiFi we can parse out a lot of different parts of these files and use them for filtering and alerting with ease.

This is what many of the lines in the log file look like:

Jan 01 03:31:01 somenewserver-310 krb5kdc[28593](info): AS_REQ (4 etypes {18 17 16 23}) 192.168.237.220: ISSUE: authtime 1546278185, etypes {rep=18 tkt=16 ses=18}, nn/somenewserver-310.field.hortonworks.com@HWX.COM for nn/somenewserver-310.field.hortonworks.com@HWX.COM

State of the Tail Processor

Tail a File

We also have the option of using the GrokReader listed in an article included to immediately convert matching records to output formats like JSON or Avro and then partition into groups. We'll do that in a later article.

In this one, we can get a line from the file via Tail, read a list of files and fetch one at a time or generate a flow file for testing. Once we had some data we'll start parsing into different message types. These messages can then be use for alerting, routing, permanent storage in Hive/Impala/HBase/Kudu/Druid/S3/Object Storage/etc...

In the next step we will do some routing and alerting. Follow up by some natural language processing (NLP), machine learning and then we'll use various tools to search, aggregate, query, catalog, report on and build dashboards from this type of log and others.

Example Output JSON Formatted

PREAUTH

{
  "date" : "Jan 07 02:25:15",
  "etypes" : "2 etypes {23 16}",
  "MONTH" : "Jan",
  "HOUR" : "02",
  "emailhost" : "cloudera.net",
  "TIME" : "02:25:15",
  "pid" : "21546",
  "loghost" : "KDCHOST1",
  "kuser" : "krbtgt",
  "message" : "Additional pre-authentication required",
  "emailuser" : "user1",
  "MINUTE" : "25",
  "SECOND" : "15",
  "LOGLEVEL" : "info",
  "MONTHDAY" : "01",
  "apphost" : "APP_HOST1",
  "kuserhost" : "cloudera.net@cloudera.net"
}

ISSUE

{
  "date" : "Jan 01 03:20:09",
  "etypes" : "2 etypes {23 18}",
  "MONTH" : "Jan",
  "HOUR" : "03",
  "BASE10NUM" : "1546330809",
  "emailhost" : "cloudera.net",
  "TIME" : "03:20:09",
  "pid" : "24546",
  "loghost" : "KDCHOST1",
  "kuser" : "krbtgt",
  "message" : "",
  "emailuser" : "user1",
  "authtime" : "1546330809",
  "MINUTE" : "20",
  "SECOND" : "09",
  "etypes2" : "rep=23 tkt=18 ses=23",
  "LOGLEVEL" : "info",
  "MONTHDAY" : "01",
  "apphost" : "APP_HOST1",
  "kuserhost" : "cloudera.net@cloudera.net"
}

Grok Expressions

For Parsing Failure Records

%{SYSLOGTIMESTAMP:date} %{HOSTNAME:loghost} krb5kdc\[%{POSINT:pid}\]\(%{LOGLEVEL}\): %{GREEDYDATA:premessage}failure%{GREEDYDATA:postmessage}

For Parsing PREAUTH Records

%{SYSLOGTIMESTAMP:date} %{HOSTNAME:loghost} krb5kdc\[%{POSINT:pid}\]\(%{LOGLEVEL}\): AS_REQ \(%{GREEDYDATA:etypes}\) %{GREEDYDATA:apphost}: NEEDED_PREAUTH: %{USERNAME:emailuser}@%{HOSTNAME:emailhost} for %{GREEDYDATA:kuser}/%{GREEDYDATA:kuserhost}, %{GREEDYDATA:message}

For Parsing ISSUE Records

%{SYSLOGTIMESTAMP:date} %{HOSTNAME:loghost} krb5kdc\[%{POSINT:pid}\]\(%{LOGLEVEL}\): AS_REQ \(%{GREEDYDATA:etypes}\) %{GREEDYDATA:apphost}: ISSUE: authtime %{NUMBER:authtime}, etypes \{%{GREEDYDATA:etypes2}\}, %{USERNAME:emailuser}@%{HOSTNAME:emailhost} for %{GREEDYDATA:kuser}/%{GREEDYDATA:kuserhost}%{GREEDYDATA:message}

Resources:

For Testing Grok Against Your Files
http://grokdebug.herokuapp.com/

A Great Article on Using GrokReader for Record Oriented Processing

https://community.hortonworks.com/articles/131320/using-partitionrecord-grokreaderjsonwriter-to-pars...

More About Grok

https://datahovel.com/2018/07/

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services...

http://grokconstructor.appspot.com/do/automatic?example=0

https://gist.github.com/acobaugh/5aecffbaaa593d80022b3534e5363a2d

Cloudera Community

Community Articles

Reading Kerberos KDC Logs and Parsing Them for Events via Apache NiFi 1.8 - Part 1

Apache NiFi