Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can I merge big XML files with attributes in tags using Nifi MergeRecord

avatar
New Contributor

Hi,

 

I have been trying to merge xml records with attributes in the tag using the MergeRecord processor. For the controllers I am using XMLReader and writes with no special configuration other than infer schema and schemaName. I am sure the configurations might be wrong but not sure what I need to fix. Any help will be much appreciate it.

 

Thanks 🙂

1 REPLY 1

avatar
Super Mentor

@prparadise 

The NiFi MergeRecord assigns queued FlowFile on inbound connections to bins.  Bins can only contain "like FlowFiles". In order for two FlowFiles to be considered 'like FlowFiles', they must have the same Schema (as identified by the Record Reader) and, if the <Correlation Attribute Name> property is set, the same value for the specified attribute.

Initial thoughts:
1. Perhaps your source FlowFiles are resulting in unique inferred schemas. The XMLRecordSetWriter can be configured with a schema write strategy such as "Set 'avro.schema' attribute" so that the output merged FlowFile has the schema added to an attribute (this would allow you to see the inferred schema on multiple FlowFiles to see if they match.
2. The min number of records per bin is set to 1 still.  When the Merge type processors execute that look at an inbound connection and allocate queued FlowFiles to bin(s).  At end of binning, it will see if any bin is eligible for merge. This processor can execute very fast and frequently.  Let's say that each time it executes, the inbound connection only contains 1 FlowFile. Since  min records per bin is 1, a bin with only one FlowFile would get merged.  Try setting the min records to a higher value.  Whenever you change the "min" settings, you should also set the "Max Bin Age" property.  This forces a bin to merge after the configured amount of time even if min values are not met.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt