- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How can I merge big XML files with attributes in tags using Nifi MergeRecord
- Labels:
-
NiFi Registry
Created ‎08-31-2022 01:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have been trying to merge xml records with attributes in the tag using the MergeRecord processor. For the controllers I am using XMLReader and writes with no special configuration other than infer schema and schemaName. I am sure the configurations might be wrong but not sure what I need to fix. Any help will be much appreciate it.
Thanks 🙂
Created ‎09-22-2022 06:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@prparadise
The NiFi MergeRecord assigns queued FlowFile on inbound connections to bins. Bins can only contain "like FlowFiles". In order for two FlowFiles to be considered 'like FlowFiles', they must have the same Schema (as identified by the Record Reader) and, if the <Correlation Attribute Name> property is set, the same value for the specified attribute.
Initial thoughts:
1. Perhaps your source FlowFiles are resulting in unique inferred schemas. The XMLRecordSetWriter can be configured with a schema write strategy such as "Set 'avro.schema' attribute" so that the output merged FlowFile has the schema added to an attribute (this would allow you to see the inferred schema on multiple FlowFiles to see if they match.
2. The min number of records per bin is set to 1 still. When the Merge type processors execute that look at an inbound connection and allocate queued FlowFiles to bin(s). At end of binning, it will see if any bin is eligible for merge. This processor can execute very fast and frequently. Let's say that each time it executes, the inbound connection only contains 1 FlowFile. Since min records per bin is 1, a bin with only one FlowFile would get merged. Try setting the min records to a higher value. Whenever you change the "min" settings, you should also set the "Max Bin Age" property. This forces a bin to merge after the configured amount of time even if min values are not met.
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt
