Created on 06-13-2018 11:42 AM - edited 09-16-2022 06:20 AM
I am having two csv files and want to merge them into single csv file using Id column.
Created 06-13-2018 02:09 PM
Hi @rajat puchnanda,
If by merging you means doing an union, you can use the processor mergecontent if the two csv have the same structure.
Best regards,
Michel
Created 06-13-2018 02:18 PM
Hi @msumbul
Thanks for your answer. I tried to do that but it was creating files again and again and data was getting duplicated.
i tried to did it like getting files from get file(2 processors and then mergecontent-->Putfile)..
Created on 06-14-2018 06:11 AM - edited 08-17-2019 05:58 PM
Hi Michel,
I am having different metadata in all the files except one common column. So which property i should use for Metadata Strategy in merge content processor.
Created 06-13-2018 02:14 PM
@Shu
InputFile 1
deptid | firstname | lastname |
1 | Aman | Sharma |
2 | Raman | Verma |
InputFile 2
deptid | salary | |
1 | 20000 | abc@gmail.com |
2 | 30000 | bgf@gmail.com |
OutputFile(By merging file1 and file2):-
deptid | firstname | lastname | salary | |
1 | Aman | Sharma | 20000 | abc@gmail.com |
2 | Raman | Verma | 30000 | bgf@gmail.com |
like the output will be grouping by deptid.
How can i get this ouput?
Created 06-13-2018 02:29 PM
Hi @rajat puchnanda,
Based on your example, you are trying to do a "join". Nifi is not an ETL tool but more a flow manager, it allow to move data accros system and to do some very simple transformation like csv to avro. You should not do computation or join with Nifi.
For you usecase it would be better to use another tools like hive, spark,...
Best regards,
Michel
Created on 06-13-2018 10:16 PM - edited 08-17-2019 05:59 PM
Merging that you are expecting is a lookup in the departments table with department id,
For your case you can use LookUpRecord processor to look for deptid and add get the salary,email and add to the record.
LookupRecord processor supports all these controller services to
Load your inputfile2 in one of the lookup services then use LookupRecord to look for deptid value then add the value to the record.
Refer to this and this links to get more details regarding configuration and working with LookupRecord processor.
Created 06-14-2018 06:03 AM
Thank you @Shu.
Actually i am tryng to merge(Union) records based on Id. eg in input file1 the Id can be:1,2,3
and in iinputfile2 deptid can be:-3,4,5.
So it will merge all the records. Shall i use the queryrecord processor befire or after the mergecontent..?
Created 06-14-2018 01:00 PM
Merge a group of flowfile (or) records is possible with MergeContent/MergeRecord processors.
Example:
if flowfile(ff1) having 123 records then ff2 having 345 by using mergecontent/record processors we can merge these flowfiles in to one like 123345.
Merge means combining the group of records/flowfiles(union all) ,if you want to remove duplicates(i.e 3 is duplicate record) from the combined record flowfile content then you can use QueryRecord Processor with row_number window function to eliminate duplicates.
This scenario is possible with NiFi without using lookup record processors.
But as you mentioned in one of the answer
Scenario2:
InputFile 1
deptid | firstname | lastname |
1 | Aman | Sharma |
2 | Raman | Verma |
InputFile 2
deptid | salary | |
1 | 20000 | abc@gmail.com |
2 | 30000 | bgf@gmail.com |
OutputFile(By merging file1 and file2):-
deptid | firstname | lastname | salary | |
1 | Aman | Sharma | 20000 | abc@gmail.com |
2 | Raman | Verma | 30000 | bgf@gmail.com |
This is not possible with MergeContent/Record but you can try with QueryRecord processor by implementing group and collect as set (or) some sort of sql logic in queryrecord processor to transpose the data into your desired format.This query Would be intensive if you are doing on larger number of records.