Created 06-23-2025 10:02 AM
https://medium.com/@surajnagendra/merge-csv-files-apache-nifi-21ba44e1b719
I tried this approach but it didn't worked well
Created 06-26-2025 01:21 PM
If you know where the CSV files are on the filesystem and the condition is simple, you may be able to start with CSV file 1 then use 2 LookupRecord processors in sequence with 2 CSVRecordLookupService controller services (each pointing at CSV file 2 and 3 respectively). If that doesn't suit your needs, check out the ForkEnrichment and JoinEnrichment processors, they may be able to do what you need.
Created 06-23-2025 04:56 PM
@Bhar Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our NiFi experts @MattWho @mburgess who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created on 06-25-2025 04:21 AM - edited 06-25-2025 04:29 AM
If you are having all files in same format, then mergecontent is only option.
One simple example that may give you hint to solve your problem.
Use the RouteonAttribute processor and connect to merger record processor
On the RouteonAttribute processor use the configuration ${merge.count:equals(1)}
let say if single file then it will end. else it will go to mergecontent processor and merge all files.
Created 06-25-2025 06:01 AM
I'm using the QueryRecord processor in Apache NiFi to perform a LEFT JOIN between two sets of records within a single FlowFile. The records are distinguished by a field m, where m = 'a' represents one dataset and m = 'b' represents the other.
Here is the SQL query I'm using:
SELECT *
FROM (
SELECT * FROM FLOWFILE WHERE m = 'a'
) file1
LEFT JOIN (
SELECT * FROM FLOWFILE WHERE m = 'b'
) file2
ON file1.ID = file2.rapid_id
However, the result only includes records from the m = 'a' side. When I switch the inner queries (i.e., use m = 'b' as the left side), I only get records from that side instead. It seems the LEFT JOIN is not functioning as expected — it behaves more like an INNER JOIN.
Has anyone encountered this behavior with QueryRecord? Is there a limitation in how it handles subqueries or joins within a single FlowFile? Any guidance or workaround would be appreciated.
Thanks in advance!
Created 06-26-2025 01:21 PM
If you know where the CSV files are on the filesystem and the condition is simple, you may be able to start with CSV file 1 then use 2 LookupRecord processors in sequence with 2 CSVRecordLookupService controller services (each pointing at CSV file 2 and 3 respectively). If that doesn't suit your needs, check out the ForkEnrichment and JoinEnrichment processors, they may be able to do what you need.