Support Questions

Find answers, ask questions, and share your expertise

merge too csv files in nifi

avatar

I am having two csv files and want to merge them into single csv file using Id column.

8 REPLIES 8

avatar
Expert Contributor

Hi @rajat puchnanda,

If by merging you means doing an union, you can use the processor mergecontent if the two csv have the same structure.

Best regards,
Michel

avatar

Hi @msumbul

Thanks for your answer. I tried to do that but it was creating files again and again and data was getting duplicated.

i tried to did it like getting files from get file(2 processors and then mergecontent-->Putfile)..

avatar

Hi Michel,

I am having different metadata in all the files except one common column. So which property i should use for Metadata Strategy in merge content processor.

78411-mergecontent.png

avatar

@Shu

InputFile 1

deptidfirstnamelastname
1AmanSharma
2RamanVerma

InputFile 2

deptidsalaryemail
120000abc@gmail.com
230000bgf@gmail.com

OutputFile(By merging file1 and file2):-

deptidfirstnamelastnamesalaryemail
1AmanSharma20000abc@gmail.com
2RamanVerma30000bgf@gmail.com

like the output will be grouping by deptid.

How can i get this ouput?

avatar
Expert Contributor

Hi @rajat puchnanda,

Based on your example, you are trying to do a "join". Nifi is not an ETL tool but more a flow manager, it allow to move data accros system and to do some very simple transformation like csv to avro. You should not do computation or join with Nifi.

For you usecase it would be better to use another tools like hive, spark,...

Best regards,
Michel

avatar
Master Guru

@rajat puchnanda

Merging that you are expecting is a lookup in the departments table with department id,

For your case you can use LookUpRecord processor to look for deptid and add get the salary,email and add to the record.

LookupRecord processor supports all these controller services to

78410-lookuprecord-controller-services.png

Load your inputfile2 in one of the lookup services then use LookupRecord to look for deptid value then add the value to the record.

Refer to this and this links to get more details regarding configuration and working with LookupRecord processor.

avatar

Thank you @Shu.

Actually i am tryng to merge(Union) records based on Id. eg in input file1 the Id can be:1,2,3

and in iinputfile2 deptid can be:-3,4,5.

So it will merge all the records. Shall i use the queryrecord processor befire or after the mergecontent..?

avatar
Master Guru
@rajat puchnanda

Merge a group of flowfile (or) records is possible with MergeContent/MergeRecord processors.

Example:

if flowfile(ff1) having 123 records then ff2 having 345 by using mergecontent/record processors we can merge these flowfiles in to one like 123345.

Merge means combining the group of records/flowfiles(union all) ,if you want to remove duplicates(i.e 3 is duplicate record) from the combined record flowfile content then you can use QueryRecord Processor with row_number window function to eliminate duplicates.

This scenario is possible with NiFi without using lookup record processors.

But as you mentioned in one of the answer

Scenario2:

InputFile 1

deptidfirstnamelastname
1AmanSharma
2RamanVerma

InputFile 2

deptidsalaryemail
120000abc@gmail.com
230000bgf@gmail.com

OutputFile(By merging file1 and file2):-

deptidfirstnamelastnamesalaryemail
1AmanSharma20000abc@gmail.com
2RamanVerma30000bgf@gmail.com

This is not possible with MergeContent/Record but you can try with QueryRecord processor by implementing group and collect as set (or) some sort of sql logic in queryrecord processor to transpose the data into your desired format.This query Would be intensive if you are doing on larger number of records.