Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

How to preventing duplicates when ingesting into MYSQL using Nifi?

Contributor

I have a dataflow that ingest file from sftp into mysql and would like to know how to prevent an enormous amount of duplicates being ingested by nifi into mysql. Attached details below. Thanks

(1)Data flow

7853-updatednifiwrkfl.jpg

(2) count after nifi ingest into mysql

7854-countfromflowfile.jpg

(3)Original Data on SFTP

7855-originaldata.jpg

1 ACCEPTED SOLUTION

In many parts of your flow you have multiple relationships routed to the next processor when you probably want only one, some examples...

  • Between SplitText and ExtractText you have original and splits connected, but you probably only want splits here.
  • Between ExtractText and ReplaceText you have matched and unmatched, but you probably only want matched.
  • Between ReplaceText and PutSQL you have success and failure, but you probably only want success.
  • On PutSQL you have route failure, success, and retry back to itself, and you probably only want retry (you definitely don't want success routed back to itself).

You would most likely auto-terminate these other relationships (first tab when configured a processor).

View solution in original post

2 REPLIES 2

In many parts of your flow you have multiple relationships routed to the next processor when you probably want only one, some examples...

  • Between SplitText and ExtractText you have original and splits connected, but you probably only want splits here.
  • Between ExtractText and ReplaceText you have matched and unmatched, but you probably only want matched.
  • Between ReplaceText and PutSQL you have success and failure, but you probably only want success.
  • On PutSQL you have route failure, success, and retry back to itself, and you probably only want retry (you definitely don't want success routed back to itself).

You would most likely auto-terminate these other relationships (first tab when configured a processor).

Contributor

@Bryan Bende thanks a lot for the help works perfect. I attached a snippet of the updated workflow should anyone experience such an issue in the future.Thanks again.

7856-workingnifiwrkflow.jpg