Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Apache Nifi: Handling a long running task?

Apache Nifi: Handling a long running task?

New Contributor

Hi everyone -

I've a Nifi question I was wondering if someone could help with. I've a sequence of steps as follows:-

1) A process group extracting data from multiple sources, merging it and storing it in a Mongo datastore.

2) A python script that needs to operate on this collection and outputs a separate collection (does de-duplication / record linkage).

3) Finally, another process group that reads this new collection from Mongo and publishes it to Elasticsearch.

I'm not convinced `ExecuteScript` is the nicest way of handling it as the job could take an hour or two to run, plus debugging the runs and visibility over what it is doing seems quite brittle.

Has anyone any ideas about a nicer way of handling this?

I had a look at Wait/Notify but not quite sure if it fits my needs, nor how I'd communicate to the script that the extract was done, and similarly tell the publish step to read from the new collection.

Thanks,

Gavin.