Support Questions

Find answers, ask questions, and share your expertise

NiFi: Merge files based on attribute and send email notification

avatar
Contributor

I have a requirement to notify once the files on the destination is processed. 

  1. The pipeline is as follows: ListSFTP --> Update attribute (adds specific attribute for each state/province) ---> Send to Cloud Storage. 
  2. Each state has a different count of files. 
  3. Once the files are sent to cloud storage, notify the end user using email for each state. 

I have tried merging files based on a common attribute but, I need single files for each state so that I can send one notification for each state. 

 

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Jagapriyan 

As a daily job, i may suggest you tackle this differently.
You know your source files are written between 8am - 9am each day. 
So i would configure your listSFTP to run on a cron schedule so it runs every second from 9am-10am to make sure all files are listed.  Then knowing that your files may number 90+ (unknown on max) , I would configure your "Min Num of Entries" to some value you know the count will never reach.  Make sure "Max Num Entries" is set to a value higher than that.  Then configure the "Max Bin Age" to some time 30 mins? 

What this does is allow MergeContent to continue to allocated FlowFiles to a bin for 30 minutes at which time the bin is forced to merge even if the min value has not be reached.  Doing this makes sure you get only one FlowFile out per bin per node.  That single FlowFile can then be used to trigger your putEmail used for notification.  Additionally, the merged FlowFile will have an attribute "merge.count" added that you can use in your email body to report number of FlowFiles that were ingested.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Jagapriyan 

Your described flow above does not mention the mergeContent processor which is what would be needed to merge multiple FlowFiles with matching attributes values into 1 output FlowFile.

Share your MergeContent processor configuration.

Additionally the ListSFTP processor does not download the content of the files form the remote server.  It is only used to list the files on the remote server and set attributes on the FlowFile that would be used by the FetchSFTP processor to actually download the content. 

How do you know when you have all the files for a given state?  Is this a continues feed of files?  Is this a daily job?  While file count is different per state, is count same per state?  What is the highest count and lowest count?

Thanks,
Matt

avatar
Contributor

Hi @MattWho 

Thanks for the time to respond. 

How do you know when you have all the files for a given state?  Usually, the files are status reports for the previous report. They are generated between 8am to 9am everyday. 

For example: If I have 2 countries US and Canada, there will be 90 files for US and 100 files for Canada. (the number of files may vary based on usage the date before)

The flow: We have a list SFTP processor and Fetch SFTP processor that are collecting these files and storing on GCP.

The requirement:  I need to send an email notifying that files are delivered. 

What I Tried:  Since I need to send one email for each country, I tried  "replacetext" processor to remove the contents of the flow file and then merge the files into one. Since I can't use a hardcoded number, I couldn't give a value in minimum entries. 

My current Merge content processor

Jagapriyan_0-1665664863474.png

 

 

avatar
Master Mentor

@Jagapriyan 

As a daily job, i may suggest you tackle this differently.
You know your source files are written between 8am - 9am each day. 
So i would configure your listSFTP to run on a cron schedule so it runs every second from 9am-10am to make sure all files are listed.  Then knowing that your files may number 90+ (unknown on max) , I would configure your "Min Num of Entries" to some value you know the count will never reach.  Make sure "Max Num Entries" is set to a value higher than that.  Then configure the "Max Bin Age" to some time 30 mins? 

What this does is allow MergeContent to continue to allocated FlowFiles to a bin for 30 minutes at which time the bin is forced to merge even if the min value has not be reached.  Doing this makes sure you get only one FlowFile out per bin per node.  That single FlowFile can then be used to trigger your putEmail used for notification.  Additionally, the merged FlowFile will have an attribute "merge.count" added that you can use in your email body to report number of FlowFiles that were ingested.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
Contributor

Thank you @Matt