Support Questions

Find answers, ask questions, and share your expertise

Merge Queue Attributs, but not the content

avatar
Contributor

Hi guys,

 

i've got a queue containing ~1000 files and i want to receive a mail with all ~1000 filenames. I don't care about the content of all files, only the ${filename} should be put on a list and send as content.

 

Example:

 

- Queue Position 1 - filename: first.xml

- Queue Position 2 - filename: second.xml

- Queue Position 3 - filename: third.xml

 

Resulting mail content (possibility 1):

- first.xml

- second.xml

- third.xml

Resulting mail content (possibility 2):

first.xml, second.xml, third.xml



Something just like a list of filenames in queue.

Dies anybody got an idea how to solve without many api-magic?

Greetings and best regards - Max

1 ACCEPTED SOLUTION

avatar
Super Guru

Hi,

Another option which doesnt include writing custom code is to use AttributesToCSV and MergeConent processors as follows:

1-  AttributesToCSV: This will convert an attribute - filename in your case - to CSV format which will  be result in single value flowfile. make sure to set the Destination property to "flowfile-content".

2- MergeContent: This will merge the content of flowfiles from above into single file depending on your merge strategy and other properties like Max Entries Per Bin. Also you can specify Merge Delimiter wither you want new line or comma.

3- PutEmail: This will put the merge content from above into an email and send it.

 

I think this is easier and more straight forward solution than the suggestion above.

Hope that helps.

 

 

View solution in original post

4 REPLIES 4

avatar
Super Guru

Hi,

I'm not aware of any out of the box processor that can help you with that. A suggestion would be to write custom code in an ExecuteScript Processor that gets the filename attribute and stores in a file with the required format in some staging directory, In this processor  you can also decide how many flowfiles you want to process per file and once that file reaches the limit (could be by file size or number of entries) you then move the file to final directory where and GetFile processor is reading from and direct the content of the read file (filenames) to PUTEmail. New file will be created in the staging area to address any new entries after moving the older file to the final directory.

avatar
Super Guru

Hi,

Another option which doesnt include writing custom code is to use AttributesToCSV and MergeConent processors as follows:

1-  AttributesToCSV: This will convert an attribute - filename in your case - to CSV format which will  be result in single value flowfile. make sure to set the Destination property to "flowfile-content".

2- MergeContent: This will merge the content of flowfiles from above into single file depending on your merge strategy and other properties like Max Entries Per Bin. Also you can specify Merge Delimiter wither you want new line or comma.

3- PutEmail: This will put the merge content from above into an email and send it.

 

I think this is easier and more straight forward solution than the suggestion above.

Hope that helps.

 

 

avatar
Master Mentor

@mbraunerde 

Assuming you do not want to lose the original content of all these files, so you have numerous challenges here.

1. You don't have a known number of files.  So when collecting a single list of all Filenames, how do you know that all have been received in the queue? Since NiFi is designed as a data in motion service.
2. Preserving the original FlowFiles content.  Sounds like you are maybe trying to produce a new FlowFile with content containing just the filenames of all the files received by your NiFi dataflow, but still allow original FlowFiles along with the original content still get processed separately?

Overcoming these challenges a depends on some unknowns at this point.
1. How is this data ingested to your NiFi?  Is it a constant stream of data?  Is it burst of data once a day?  If you can control the ingest and there is a known gap between stream of data, you maybe over to overcome challenge 1 above.
2. Overcoming challenge 2 can be done via cloning of the FlowFiles.  Every NiFi processor has outbound relationships that can be added to NiFi connections or auto-terminated within a processor's configuration. So some point in your flow you would simply add the "success" relationship  of a processor to two different connections (essentially one connection will have original FlowFile and other would have the clone.  Down one dataflow path you continue to handle the FlowFiles with their original content. The other dataflow path you can use a ReplaceText processor to do a literal replace replacement strategy and set the Replacement Value to ${filename}. What this will do is replace all the content of that FlowFile with just the filename of that FlowFile.  Then as @SAMSAL suggested use a MergeContent processor to merge all your FlowFiles so you have one new FlowFile containing all the Filenames. Since you are dealing with an unknown number of files, you could configure the MergeContent with an arbitrarily large Minimum Number of Entries (some value larger than you would expect to receive in a single batch.  You would also need to set Maximum Number of Entries to a value equal to or larger then the min.  This will cause FlowFile to continue to get added to a bin for merge without actually being merged.  Then you set Max Bin Age to a value high enough that all batch FlowFile would have been processed.  Max Bin Age serve as a method to force a bin to merge even if min values have not been reached after a configured amount of time.  So you are building in a delay in this flow to allow for the data in motion nature of NiFi.  Finally sent that merged FlowFile to your putEmail processor.

Or maybe we are not understanding the use case completely.  Are you looking for what is actually in a given queue and positional order?  Keep in mind that NiFi is a data in motion service meaning that it should not be used to hold data in queues.  Which in turn means that the queued FlowFiles in a connection are typically constantly changing.  But if this is what you are looking for, you could use the invokeHTTP processor to obtain the listing of FlowFiles in a queue.  This would require a series of rest-api calls.  First invokeHTTP would make a request via a POST get generate a queue listing result set for a connection from all nodes.  The response to that post would be the url of the result set which you would use in a second invokeHTTP to GET that result set.  Finally you would need a third invokeHTTP to DELETE the result set so it is not left hanging around in NiFi heap memory.  Even then you have a large json which contains a lot more than just position, filename, and NiFi cluster host names.  So you would use additional processor to parse the desired information from that json record.

If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.

Thank you,

Matt

avatar
Contributor

Hi @SAMSAL ,

 

thank you for your help, but you're absoluty right in your second post - the first one seems very "uncomfortable".

 

@MattWho: I really don't care about the content, i've created backup-files in the stages before mailing and only the consumed filenames are relevant for mail. Last time there was 350 files and 35MB for backup 😉