Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Getting random duplicate data in Nifi... WHY?

Highlighted

Getting random duplicate data in Nifi... WHY?

Rising Star

40884-screen-shot-2017-10-17-at-94125-am.png

40885-nififlowoct1703.png

We are currently randomly receiving duplicate data of every event in our flow setup. I suspect it might have to do with our SplitText processors in the beginning but would like some professional feedback on this. My flows are in the screenshots below and I will relate any information or screenshots of configs upon request. The SplitText processors basically split the incoming data into 100k bundles then 10k bundles - this was implemented to ease the burden if we get a blast of data all at once (GBs worth) and it seems to work to handle that but now I fear its causing this issue. That being said there was times before when duplicate data occurred but not as frequent by a long stretch.

3 REPLIES 3

Re: Getting random duplicate data in Nifi... WHY?

Master Guru

@Eric Lloyd

My suggestion would be to use Provenance to track the lineage of two FlowFiles believed to be duplicates to identify the source of the duplication.

Thanks,

Matt

Re: Getting random duplicate data in Nifi... WHY?

Rising Star

Could you expand on "track the lineage"? Do you mean, see what file it comes from because what I suspect to be duplicate data I see comes form the same file. Are there other data provenance elements to look at that might reveal something? Thanks.

Re: Getting random duplicate data in Nifi... WHY?

Rising Star

I will say its odd because we also use HUNK and Im looking at the ten different flows and the duplication that occurred over the last 24 hours. It seems from 7 am - 8 am, it duplicated all the data just for that hour on all 10 independent flows. That seems significant to me - what are the odds of a fluke in 10 different processors causing duplication at the same time... does this hint to a culprit possibly?

Don't have an account?
Coming from Hortonworks? Activate your account here