I have a requirement where we have a template which uses SQL as source and SQL as destination and data would be more than 100GB for each table so here template will be instantiated multiple times based on tables to be migrated and also each table is partitioned into multiple flowfiles. How do we know when the process is completed? As here there will be multiple flowfiles we are unable to conclude as it hits end processor.
I have tried using SitetoSiteStatusReportingTask to check queue count, but it provides count based on connection and its difficult to fetch connectionid for each connection then concatenate as we have large number of templates. Here we have another problem in reporting task as it provides data on all process groups which are available on NIFI canvas which will be huge data if all templates are running and may impact in performance even though I used avro schema to fetch only queue count and connection id.
Can you please suggest some ideas and help me to achieve this?
Hi...to know which which flowfile completed, you can use a putemail processor to get an email when a particular flowfiles is finished. You can make it dynamic using db.table.name attribute which is added by generatetablefetch...if you have a lot of flowfiles for a single table, you can merge the flowfiles using mergecontent on tablename to give you periodic or batch completion status.
Another way could be to write success and failures to for e.g hive table and you can check the table for completions and failure.
Hope this helps. If the comment helps you to find a solution or move forward, please accept it as a solution for other community members.