Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Rssfeed (xml) check pubDate before ingesting data

Highlighted

Rssfeed (xml) check pubDate before ingesting data

New Contributor

I have a NiFi Flow that ingest XML data and push to HBase Table.  We noticed the XML data that processes (ingested XML data) every 5 mins is the same content.  Is there a way to add a processor to check that data or pubDate to see if it's changed according to previous data pushed to HBase table

2 REPLIES 2

Re: Rssfeed (xml) check pubDate before ingesting data

Master Guru

@melvint 

 

You might look in to using the "HashContent" and "DetectDuplicate" processors.  You can create a HASH for the content of each of your FlowFIles and use DetectDuplicate to see if a FlowFiles hash matches a previous hash that already processed.  If so, the duplicate is routed out of your regular dataflow path so you don't sent it HBase.

 

Hope this helps,

Matt

Highlighted

Re: Rssfeed (xml) check pubDate before ingesting data

New Contributor

@MattWho... Thanks for the response.  I've looked at that.  I've been trying to get it to work with the Hash and DetectDuplicate processors and without adding the Hash processor.  I must be doing something wrong... Is there an example or something I could look at...

 

Is there a way to upload my current Nifi Flow

Don't have an account?
Coming from Hortonworks? Activate your account here