Support Questions

Find answers, ask questions, and share your expertise

Getting random duplicate data in Nifi... WHY?

Rising Star

40884-screen-shot-2017-10-17-at-94125-am.png

40885-nififlowoct1703.png

We are currently randomly receiving duplicate data of every event in our flow setup. I suspect it might have to do with our SplitText processors in the beginning but would like some professional feedback on this. My flows are in the screenshots below and I will relate any information or screenshots of configs upon request. The SplitText processors basically split the incoming data into 100k bundles then 10k bundles - this was implemented to ease the burden if we get a blast of data all at once (GBs worth) and it seems to work to handle that but now I fear its causing this issue. That being said there was times before when duplicate data occurred but not as frequent by a long stretch.

3 REPLIES 3

Master Guru

@Eric Lloyd

My suggestion would be to use Provenance to track the lineage of two FlowFiles believed to be duplicates to identify the source of the duplication.

Thanks,

Matt

Rising Star

Could you expand on "track the lineage"? Do you mean, see what file it comes from because what I suspect to be duplicate data I see comes form the same file. Are there other data provenance elements to look at that might reveal something? Thanks.

Rising Star

I will say its odd because we also use HUNK and Im looking at the ten different flows and the duplication that occurred over the last 24 hours. It seems from 7 am - 8 am, it duplicated all the data just for that hour on all 10 independent flows. That seems significant to me - what are the odds of a fluke in 10 different processors causing duplication at the same time... does this hint to a culprit possibly?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.