Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

MergeContent Issue Nifi Apache

avatar
New Contributor

Hi ladies and girls,

I'm trying to merge error logs generated by ExecuteSQL and send them via email. Currently, I have the following flow:

ExecuteSQL - failure - > MergeContent - merged -> PutEmail.

If I disable MergeContent, the logs are being sent as they should but ( an inidividual email for each error log). If I enable MergeContent, I receive a single email however it duplicates the first error.

ERROR: syntax error at or near ""1qa""
Where: PL/pgSQL function "extract".truncate_tables(character varying) line 5 at EXECUTEERROR: syntax error at or near ""1qa""
Where: PL/pgSQL function "extract".truncate_tables(character varying) line 5 at EXECUTEERROR: syntax error at or near ""1qa""
Where: PL/pgSQL function "extract".truncate_tables(character varying) line 5 at EXECUTEERROR: syntax error at or near ""1qa""

Here are the settings that I use:

need_help_0-1695212486770.png

I also added a LogMessage processor and the flow for original is showing the correct content. The problem seems to be with the merged logs. Can someone pinpoint where the problem is or how can I fix this issue?

2 ACCEPTED SOLUTIONS

avatar

@need_help, try replacing MergeContent with MergeRecords. I assume that each error log gets generated in a single flow file. Using MergeRecord you could achieve something similar but you will need to create two Controller Services: 1 for CSV Reading and 1 for CSV Writing, both of them using Inherit Schema. Next, you can group as many records as you would like and send them to your PutEmail processor. This is how I used it so far and it works pretty well for my use case.

View solution in original post

avatar
Super Mentor

@need_help 
100% agree with @cotopaul  about using mergeRecord here. 
But I thought I would to explain what is happening with your current dataflow.

The mergeContent processor is intended to be used to merge the content of multiple FlowFiles in to a single FlowFile.  You have configured binary Concatenation which means append content of each FlowFile allocated to same bin to end of each other.  So after merge you end up with 1 FlowFile with content from multiple source FlowFiles.  So in your dataflow your ExecuteSQL is routing a lot of FlowFiles to the "failure" relationship all with same error. Your merge content when it is scheduled to execute will add FlowFiles to bin(s) based on configuration and then merge any bin that meets all minimums configured.  So here you have min number of entries set to "1" and max number of entires set to "1000".  That means that depending on how many FlowFiles are in the inbound failure connection to MergeContent before execution, you could have a bin with anywhere between 1 and 1000 FlowFiles allocated to it which then get merged resulting in one larger FlowFile with all that concatenated content.   So the MergeContent is behaving exactly how you have it configured and output is expected.  

If you want a single email for every error, I am not clear on why you are using the mergeContent processor at all.


If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

View solution in original post

3 REPLIES 3

avatar

@need_help, try replacing MergeContent with MergeRecords. I assume that each error log gets generated in a single flow file. Using MergeRecord you could achieve something similar but you will need to create two Controller Services: 1 for CSV Reading and 1 for CSV Writing, both of them using Inherit Schema. Next, you can group as many records as you would like and send them to your PutEmail processor. This is how I used it so far and it works pretty well for my use case.

avatar
Super Mentor

@need_help 
100% agree with @cotopaul  about using mergeRecord here. 
But I thought I would to explain what is happening with your current dataflow.

The mergeContent processor is intended to be used to merge the content of multiple FlowFiles in to a single FlowFile.  You have configured binary Concatenation which means append content of each FlowFile allocated to same bin to end of each other.  So after merge you end up with 1 FlowFile with content from multiple source FlowFiles.  So in your dataflow your ExecuteSQL is routing a lot of FlowFiles to the "failure" relationship all with same error. Your merge content when it is scheduled to execute will add FlowFiles to bin(s) based on configuration and then merge any bin that meets all minimums configured.  So here you have min number of entries set to "1" and max number of entires set to "1000".  That means that depending on how many FlowFiles are in the inbound failure connection to MergeContent before execution, you could have a bin with anywhere between 1 and 1000 FlowFiles allocated to it which then get merged resulting in one larger FlowFile with all that concatenated content.   So the MergeContent is behaving exactly how you have it configured and output is expected.  

If you want a single email for every error, I am not clear on why you are using the mergeContent processor at all.


If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

avatar
Community Manager

@need_help Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.  Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: