Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

NiFi Merge content component providing duplicate values

New Contributor

Hello @Shu @Peter Greiff @Alex Gauthier @Timothy Spann & Everyone!

Am trying to merge two CSV files, my ideal goal is to do an upsert ( If the record is new, then append it with existing record. if the already exists then, ignore that record). but what is happening, it just appending the all the records with existing records. Due to this am getting duplicate records. Any one having any idea? how to fix this? Your help is much appreciated.

Thanks,

Iyyappan

2 REPLIES 2

Super Guru
@Iyyappan S

MergeContent processor won't perform any upsert operations as this processor used to merge(append) flowfiles to create new flowfile based on the configs.

MergeContent processor Documentation:

"
Merges a Group of FlowFiles together based on a user-defined strategy and packages them into a single FlowFile. It is recommended that the Processor be configured with only a single incoming connection, as Group of FlowFiles will not be created from FlowFiles in different connections. This processor updates the mime.type attribute as appropriate.
"

If you are trying to do upsert operation then use HiveMerge as this strategy defined for these kind of use cases and keep your logic what needs to be done if record already exist/if not exist.

Refer to this link for more details and example how to do merge on Hive Side.

New Contributor

@Shu Thank you for your response! I have gone through hivemerge, it seems it will merge data from table to table. Am trying to merge two CSV files, is there any other option for it?