About moonspa

moonspa · ‎09-12-2024

Hi there, I am an absolute NIFI beginner, and I would like to implement the following data integration (DI) pipeline using Apache NIFI: Scenario: I have two independent data sources: A CSV file. A database. Both contain streams of sorted lines with the same column structure. Each record, except of other columns, also includes: A primary ID. A calculated hash, generated from the content of the line. Goal: I want to compare the records based on their ID columns from both sources and classify them as: New: The ID is present in the CSV file but not in the database. Changed: The ID exists in both sources, but the hash values differ. Same: The ID exists in both sources, and the hash values are the same. Deleted: The ID is missing from the CSV file but exists in the database. The output flow of this "diff" process should produce a stream where each line is enriched with a status flag ("new", "changed", "same", "deleted") indicating the result of the comparison. Background: I have successfully implemented this functionality within the "Pentaho DI" platform for years. However, I am struggling to replicate it in Apache NIFI. While I assume a processor for this must exist, I haven't been able to find or configure it yet. I would greatly appreciate any guidance or advice from the community. If anyone could point me in the right direction or suggest the appropriate NIFI components, that would be very helpful. Thanks in advance! Cheers, Tomas.

Online	Offline
Last Visited	‎09-17-2024 10:43 AM

Member Since	‎09-09-2024 03:51 PM
Last Visited	‎09-17-2024 10:43 AM
Posts	1

Cloudera Community

Apache NIFI: Merge of two lines from two sources w...