Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

SplitContent is adding a newline in Nifi -why?

avatar
Expert Contributor

Currently, we have a setup where we are trying to group together events based on timestamp and split them based on timestamp in order to keep the stack trace error which have newlines in them.

We are currently using the SplitContent to split on: "(newline)

20"

This is the format of the logs: (this example is supposed to appear single spaced, I hope it does when posted)

2017-07-13 01:00:00,123 Log data here

2017-07-13 01:00:00,124 Log data here

2017-07-13 01:00:00,125 Stack trace error here...

Stack trace error ....

Stack trace error....

Stack trace error.....

2017-07-13 01:00:00,126 Log data here

Using "(newline) 20" allows us to maintain everything between the timestamps as an event, including the stack trace. Oddly enough it will produce in HDFS events with an extra blank line in between each event. (Not usually a big deal but with 2-3 GB files, we are seeing 100+ MB of just space for blank lines) Our current solution is to have ReplaceText processor that will remove all the blank lines but its obviously not optimal.

Any suggestions are welcome. Please see screenshots.

20418-screen-shot-2017-07-13-at-14550-pm.png

20419-screen-shot-2017-07-13-at-14541-pm.png

20420-screen-shot-2017-07-13-at-14535-pm.png

2 REPLIES 2

avatar
New Contributor

You are getting new line in each split because you stetted properties like that. Basically you are saying: Find me every (new line) 20 in flow and split it (Byte Sequence: (new line)20) and in each split add byte sequence (Keep Byte Sequence: true) which is (new line) 20 and finally add it to the beginning of the split (Byte Sequence Location: Leading).

You can split your flow with Byte Sequence: 20 (without new line).

avatar
Super Mentor
@Eric Lloyd

You may want to look in to using the SplitRecord processor instead of SplitContent. You could use a the GrokReader to split you log input. Here is a great article that includes a sample grok pattern for nifi log's format:
-

https://community.hortonworks.com/articles/131320/using-partitionrecord-grokreaderjsonwriter-to-pars...

-

Thank you,

Matt