Am trying to merge two CSV files, my ideal goal is to do an upsert ( If the record is new, then append it with existing record. if the already exists then, ignore that record). but what is happening, it just appending the all the records with existing records. Due to this am getting duplicate records. Any one having any idea? how to fix this? Your help is much appreciated.
MergeContent processor won't perform any upsert operations as this processor used to merge(append) flowfiles to create new flowfile based on the configs.
MergeContent processor Documentation:
Merges a Group of FlowFiles together based on a user-defined strategy and packages them into a single FlowFile. It is recommended that the Processor be configured with only a single incoming connection, as Group of FlowFiles will not be created from FlowFiles in different connections. This processor updates the mime.type attribute as appropriate.
If you are trying to do upsert operation then use HiveMerge as this strategy defined for these kind of use cases and keep your logic what needs to be done if record already exist/if not exist.
Refer to this link for more details and example how to do merge on Hive Side.