- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi's PutHDFS processor attribute Conflict Resolution Strategy: Append causing data loss
- Labels:
-
Apache NiFi
Created ‎04-19-2017 09:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have a PutHDFS processor with the attribute Conflict Resolution Strategy set to append so we can group together events based on a specific hour.
What we found is that it is concatenating the event from the previous data to the first event in the bin that is being concatenated. This causes the timestamp become text within the file and data loss occurs.
Example:
Apr 19 1:06:59 event data here event data here event data hereApr 19 1:07:00 event data here...
Should be
Apr 19 1:06:59 event data here event data here event data here
Apr 19 1:07:00 event data here...
Has anyone else experiencing this problem or a workaround?
Created ‎04-20-2017 01:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote this on a different question yesterday, but related to the same question..
Regarding PutHDFS and appending, I believe this expected behavior... PutHDFS has no idea what it is writing to HDFS, its just writing bytes, which may or may not represent text. If you were appending parts of an image or video, there would be no such thing as new lines.
If you want a new line when you start appending, then you need the previously written data to end with a new line, or the next data to start with a new line. This should be easily done by manipulating the data in the flow before PutHDFS.
Created ‎04-20-2017 06:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could use a ReplaceText processor to append a '\n' (line break) to each event before you route it to the PutHDFS processor.
Created ‎04-20-2017 01:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for this advice, but it requires me to use regex. The issue with this is that the end character(s) of events that may be coming in will be different for the type of setup we are using. I guess I can try to search for the start of the timestamp and try to put the "\n" before it which I will try.
Its unfortunate and seems like a gross oversight if this append features exists this way, combining the previous event and appended event together. I was hoping there was a solution that was integrated in the technology for this instead of a workaround.
Created ‎04-20-2017 01:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote this on a different question yesterday, but related to the same question..
Regarding PutHDFS and appending, I believe this expected behavior... PutHDFS has no idea what it is writing to HDFS, its just writing bytes, which may or may not represent text. If you were appending parts of an image or video, there would be no such thing as new lines.
If you want a new line when you start appending, then you need the previously written data to end with a new line, or the next data to start with a new line. This should be easily done by manipulating the data in the flow before PutHDFS.
Created ‎04-20-2017 04:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was thinking that too and Hellmar's answer gave me a clue as how to do it but using ReplaceText to add a newline doesn't allow me to specify "Add a new line right after a specific bin of events" or "Add a new line right before this first line in this bin of events" but rather it allows me to use regex to find keywords in the data (I am putting a newline before the timestamp which works but also adds an extra line after every event).
Is there a way to specify "put a newline at the end of this bin of events before the append happens" ?
Created ‎04-20-2017 04:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using ReplaceText with the Replacement Strategy set to Prepend and Evaluation Mode set to Entire Text, will put the Replacement Value at the beginning of the content. Same thing could be done when using Replacement Strategy of Append to place the replacement at the end.
Alternatively, if you are using MergeContent (I can't remember) then you can use the Delimiter Strategy of Text and using the Header or Footer to enter a new line. You can use shift+enter as the property value for the Header or Footer to create a new line.
Created ‎04-20-2017 05:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Awesome! Yes! This is what I was looking for I think
Created ‎04-20-2017 05:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BTW I ended up using the Footer property in MergeContent and it worked wonderfully with no regex involved.
