Support Questions

Find answers, ask questions, and share your expertise

how to join three csv files like sql on condition in Apache Nifi?

avatar
Explorer
1 ACCEPTED SOLUTION

avatar
Master Guru

If you know where the CSV files are on the filesystem and the condition is simple, you may be able to start with CSV file 1 then use 2 LookupRecord processors in sequence with 2 CSVRecordLookupService controller services (each pointing at CSV file 2 and 3 respectively). If that doesn't suit your needs, check out the ForkEnrichment and JoinEnrichment processors, they may be able to do what you need.

View solution in original post

4 REPLIES 4

avatar
Community Manager

@Bhar Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our NiFi experts @MattWho @mburgess  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

If you are having all files in same format, then mergecontent is only option. 

One simple example that may give you hint to solve your problem.

Use the RouteonAttribute processor and connect to merger record processor

saquibsk_0-1750850767991.png

On the  RouteonAttribute  processor use the configuration ${merge.count:equals(1)}

saquibsk_2-1750850817694.png

let say if single file then it will end. else it will go to mergecontent processor and merge all files.

 

 

 

 

Shakib M.

avatar
Explorer

I'm using the QueryRecord processor in Apache NiFi to perform a LEFT JOIN between two sets of records within a single FlowFile. The records are distinguished by a field m, where m = 'a' represents one dataset and m = 'b' represents the other.

Here is the SQL query I'm using:
SELECT *
FROM (
SELECT * FROM FLOWFILE WHERE m = 'a'
) file1
LEFT JOIN (
SELECT * FROM FLOWFILE WHERE m = 'b'
) file2
ON file1.ID = file2.rapid_id




However, the result only includes records from the m = 'a' side. When I switch the inner queries (i.e., use m = 'b' as the left side), I only get records from that side instead. It seems the LEFT JOIN is not functioning as expected — it behaves more like an INNER JOIN.

Has anyone encountered this behavior with QueryRecord? Is there a limitation in how it handles subqueries or joins within a single FlowFile? Any guidance or workaround would be appreciated.

Thanks in advance!

 

avatar
Master Guru

If you know where the CSV files are on the filesystem and the condition is simple, you may be able to start with CSV file 1 then use 2 LookupRecord processors in sequence with 2 CSVRecordLookupService controller services (each pointing at CSV file 2 and 3 respectively). If that doesn't suit your needs, check out the ForkEnrichment and JoinEnrichment processors, they may be able to do what you need.