Created 07-17-2018 10:58 AM
I have two csv files :
Sample files as below :
file1.csv:
Name,PAN,Organization,TIN
raj,Awppp1234R,Erica,EWUIP1876T
avinav,EOKLP8970Y,Optus,efgtu8976t
brijesh,Qoplo1987U,InfoGaint,rhfuo1348r
raj,Awppp1234R,Erica,EWUIP1876T
file2.csv :
Name,PAN,Organization,TIN
raj,Awppp1234R,Erica,EWUIP1876T
sanjay,RTRGH1679E,INFY,WJKOI1894G
himanshu,POLKJ1673T,data69,TVBHU186B
I want to find out unique records b/w these 2 sample files on the basis of PAN and TIN using apache nifi .
so the output should be like this :
raj,Awppp1234R,Erica,EWUIP1876T
avinav,EOKLP8970Y,Optus,efgtu8976t
brijesh,Qoplo1987U,InfoGaint,rhfuo1348r
sanjay,RTRGH1679E,INFY,WJKOI1894G
himanshu,POLKJ1673T,data69,TVBHU186B
I am new to nifi , I don't know which processors I can use to solve this problem . Please let me know the complete flow to solve this problem .
Created on 07-17-2018 05:17 PM - edited 08-18-2019 01:20 AM
-
Here is a simple flow that will compare lines of a CSV file and delete any that are duplicates:
Template of above attached:
detect-duplicate-lines-in-csv.xml
If you only want to compare the PAN and TIN CSV values only of each line and not the entire line it gets a bit more complicated.
You would then need to extract the PAN and TIN Values from the content and use the HashAttribute Processor instead of HashContent.
-
Hope this help get you going.
-
Thank you,
Matt
-
If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
Created on 07-17-2018 05:17 PM - edited 08-18-2019 01:20 AM
-
Here is a simple flow that will compare lines of a CSV file and delete any that are duplicates:
Template of above attached:
detect-duplicate-lines-in-csv.xml
If you only want to compare the PAN and TIN CSV values only of each line and not the entire line it gets a bit more complicated.
You would then need to extract the PAN and TIN Values from the content and use the HashAttribute Processor instead of HashContent.
-
Hope this help get you going.
-
Thank you,
Matt
-
If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
Created on 07-17-2018 05:36 PM - edited 08-18-2019 01:20 AM
Here is the flow that could be used base d on just looking at PAN and TIN values in each line:
Created 07-17-2018 05:42 PM
For either of these examples you will need to create a "demarcator" file on disk that contains a new line and then point at that file in teh assocaited config in the mergeContent processors to make sure the merged file has one FlowFile content per line.
Created 07-18-2018 08:16 AM
Thanks @Matt Clarke . This solution worked very well for me. Thanks a lot.
Created 07-18-2018 01:37 PM
Please start a new forum question. I am probably not best resource for SQL statements. Starting a new question will get you faster response.
-
Thank you,
Matt