Support Questions

elloyd · ‎07-13-2017

Currently, we have a setup where we are trying to group together events based on timestamp and split them based on timestamp in order to keep the stack trace error which have newlines in them.

We are currently using the SplitContent to split on: "(newline)

20"

This is the format of the logs: (this example is supposed to appear single spaced, I hope it does when posted)

2017-07-13 01:00:00,123 Log data here

2017-07-13 01:00:00,124 Log data here

2017-07-13 01:00:00,125 Stack trace error here...

Stack trace error ....

Stack trace error....

Stack trace error.....

2017-07-13 01:00:00,126 Log data here

Using "(newline) 20" allows us to maintain everything between the timestamps as an event, including the stack trace. Oddly enough it will produce in HDFS events with an extra blank line in between each event. (Not usually a big deal but with 2-3 GB files, we are seeing 100+ MB of just space for blank lines) Our current solution is to have ReplaceText processor that will remove all the blank lines but its obviously not optimal.

Any suggestions are welcome. Please see screenshots.

Jelena_lmmm · ‎12-17-2018

You are getting new line in each split because you stetted properties like that. Basically you are saying: Find me every (new line) 20 in flow and split it (Byte Sequence: (new line)20) and in each split add byte sequence (Keep Byte Sequence: true) which is (new line) 20 and finally add it to the beginning of the split (Byte Sequence Location: Leading).

You can split your flow with Byte Sequence: 20 (without new line).

MattWho · ‎12-17-2018

@Eric Lloyd

You may want to look in to using the SplitRecord processor instead of SplitContent. You could use a the GrokReader to split you log input. Here is a great article that includes a sample grok pattern for nifi log's format:
-

https://community.hortonworks.com/articles/131320/using-partitionrecord-grokreaderjsonwriter-to-pars...

-

Thank you,

Matt

Cloudera Community

Support Questions

SplitContent is adding a newline in Nifi -why?

Adding A Custom Processor to NiFi : LinkProcessor

listenudp listentcp newline gets added in betwenn ...

Adding more NiFi metrics in Cloudera Manager

Why Use Druid?

Adding Atlas Classification Tags during data Inges...

Hive Statistics: Why Useful

HDF 2.x - Adding a new NiFi Node to an existing se...

Ingesting Flight Data ADS-B USB Receiver with Apac...

Adding new columns to an already partitioned Hive ...

regex replace all newlines(\n) in a file at once