Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

MergeRecord based on schema; only merge records of same schema

Explorer

My use-case is:

 

1) Have API credentials

2) Use UpdateAttribute to update (1) schema, (2) s3 bucket/location (my list of reports)

3) Query API endpoint for report

4) API endpoint paginates and gets more records

5) Call MergeRecord

6) Save to s3

 

Since 3, 4, 5, 6 are all the same, I'm re-using the processors like below (screenshot).  My problem is (5) MergeRecord will try to merge different schemas together, which is obviously a problem.

 

How can I restructure this?  I'd like to re-use processors as much as possible, but still be able to add more schemas as my needs evolve.

CRISSAEGRIM_0-1676668339630.png

 

1 ACCEPTED SOLUTION

Explorer

I used Correlation Attribute Name , setting it to `${schema.name}`, and it's working as expected.

 

Quote from documentation:

> If specified, two FlowFiles will be binned together only if they have the same value for this Attribute. If not specified, FlowFiles are bundled by the order in which they are pulled from the queue.

View solution in original post

1 REPLY 1

Explorer

I used Correlation Attribute Name , setting it to `${schema.name}`, and it's working as expected.

 

Quote from documentation:

> If specified, two FlowFiles will be binned together only if they have the same value for this Attribute. If not specified, FlowFiles are bundled by the order in which they are pulled from the queue.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.