Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SplitContent is adding a newline in Nifi -why?

SplitContent is adding a newline in Nifi -why?

Rising Star

Currently, we have a setup where we are trying to group together events based on timestamp and split them based on timestamp in order to keep the stack trace error which have newlines in them.

We are currently using the SplitContent to split on: "(newline)

20"

This is the format of the logs: (this example is supposed to appear single spaced, I hope it does when posted)

2017-07-13 01:00:00,123 Log data here

2017-07-13 01:00:00,124 Log data here

2017-07-13 01:00:00,125 Stack trace error here...

Stack trace error ....

Stack trace error....

Stack trace error.....

2017-07-13 01:00:00,126 Log data here

Using "(newline) 20" allows us to maintain everything between the timestamps as an event, including the stack trace. Oddly enough it will produce in HDFS events with an extra blank line in between each event. (Not usually a big deal but with 2-3 GB files, we are seeing 100+ MB of just space for blank lines) Our current solution is to have ReplaceText processor that will remove all the blank lines but its obviously not optimal.

Any suggestions are welcome. Please see screenshots.

20418-screen-shot-2017-07-13-at-14550-pm.png

20419-screen-shot-2017-07-13-at-14541-pm.png

20420-screen-shot-2017-07-13-at-14535-pm.png

2 REPLIES 2
Highlighted

Re: SplitContent is adding a newline in Nifi -why?

New Contributor

You are getting new line in each split because you stetted properties like that. Basically you are saying: Find me every (new line) 20 in flow and split it (Byte Sequence: (new line)20) and in each split add byte sequence (Keep Byte Sequence: true) which is (new line) 20 and finally add it to the beginning of the split (Byte Sequence Location: Leading).

You can split your flow with Byte Sequence: 20 (without new line).

Highlighted

Re: SplitContent is adding a newline in Nifi -why?

Master Guru
@Eric Lloyd

You may want to look in to using the SplitRecord processor instead of SplitContent. You could use a the GrokReader to split you log input. Here is a great article that includes a sample grok pattern for nifi log's format:
-

https://community.hortonworks.com/articles/131320/using-partitionrecord-grokreaderjsonwriter-to-pars...

-

Thank you,

Matt

Don't have an account?
Coming from Hortonworks? Activate your account here