- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to preventing duplicates when ingesting into MYSQL using Nifi?
- Labels:
-
Apache NiFi
Created on ‎09-21-2016 09:01 PM - edited ‎08-18-2019 04:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a dataflow that ingest file from sftp into mysql and would like to know how to prevent an enormous amount of duplicates being ingested by nifi into mysql. Attached details below. Thanks
(1)Data flow
(2) count after nifi ingest into mysql
(3)Original Data on SFTP
Created ‎09-21-2016 09:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In many parts of your flow you have multiple relationships routed to the next processor when you probably want only one, some examples...
- Between SplitText and ExtractText you have original and splits connected, but you probably only want splits here.
- Between ExtractText and ReplaceText you have matched and unmatched, but you probably only want matched.
- Between ReplaceText and PutSQL you have success and failure, but you probably only want success.
- On PutSQL you have route failure, success, and retry back to itself, and you probably only want retry (you definitely don't want success routed back to itself).
You would most likely auto-terminate these other relationships (first tab when configured a processor).
Created ‎09-21-2016 09:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In many parts of your flow you have multiple relationships routed to the next processor when you probably want only one, some examples...
- Between SplitText and ExtractText you have original and splits connected, but you probably only want splits here.
- Between ExtractText and ReplaceText you have matched and unmatched, but you probably only want matched.
- Between ReplaceText and PutSQL you have success and failure, but you probably only want success.
- On PutSQL you have route failure, success, and retry back to itself, and you probably only want retry (you definitely don't want success routed back to itself).
You would most likely auto-terminate these other relationships (first tab when configured a processor).
Created on ‎09-21-2016 09:25 PM - edited ‎08-18-2019 04:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Bryan Bende thanks a lot for the help works perfect. I attached a snippet of the updated workflow should anyone experience such an issue in the future.Thanks again.
