<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to find out duplicate rows (duplicacy to be checked on basis of 2 attributes ) in csv files using apache nifi ? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185649#M80816</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/86309/rinkya32.html" nodeid="86309"&gt;@Rinki&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Please start a new forum question. I am probably not best resource for SQL statements.  Starting a new question will get you faster response.&lt;/P&gt;&lt;P&gt;-&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
    <pubDate>Wed, 18 Jul 2018 20:37:54 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2018-07-18T20:37:54Z</dc:date>
    <item>
      <title>How to find out duplicate rows (duplicacy to be checked on basis of 2 attributes ) in csv files using apache nifi ?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185644#M80811</link>
      <description>&lt;P&gt;I have two csv files :&lt;/P&gt;&lt;P&gt;Sample files as below :&lt;/P&gt;&lt;P&gt; file1.csv:&lt;/P&gt;&lt;P&gt;Name,PAN,Organization,TIN&lt;/P&gt;&lt;P&gt;raj,Awppp1234R,Erica,EWUIP1876T&lt;/P&gt;&lt;P&gt;avinav,EOKLP8970Y,Optus,efgtu8976t&lt;/P&gt;&lt;P&gt;brijesh,Qoplo1987U,InfoGaint,rhfuo1348r&lt;/P&gt;&lt;P&gt;raj,Awppp1234R,Erica,EWUIP1876T&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;file2.csv :&lt;/P&gt;&lt;P&gt;Name,PAN,Organization,TIN&lt;/P&gt;&lt;P&gt;raj,Awppp1234R,Erica,EWUIP1876T&lt;/P&gt;&lt;P&gt;sanjay,RTRGH1679E,INFY,WJKOI1894G&lt;/P&gt;&lt;P&gt;himanshu,POLKJ1673T,data69,TVBHU186B&lt;/P&gt;&lt;P&gt;I want to find out unique records b/w these 2 sample files on the basis of PAN and TIN using apache nifi .&lt;/P&gt;&lt;P&gt;so the output should be like this :&lt;/P&gt;&lt;P&gt;raj,Awppp1234R,Erica,EWUIP1876T&lt;/P&gt;&lt;P&gt;avinav,EOKLP8970Y,Optus,efgtu8976t&lt;/P&gt;&lt;P&gt;brijesh,Qoplo1987U,InfoGaint,rhfuo1348r&lt;/P&gt;&lt;P&gt;sanjay,RTRGH1679E,INFY,WJKOI1894G&lt;/P&gt;&lt;P&gt;himanshu,POLKJ1673T,data69,TVBHU186B&lt;/P&gt;&lt;P&gt;I am new to nifi , I don't know which processors I can use to solve this problem . Please let me know the complete flow to solve this problem .&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jul 2018 17:58:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185644#M80811</guid>
      <dc:creator>rinkya32</dc:creator>
      <dc:date>2018-07-17T17:58:46Z</dc:date>
    </item>
    <item>
      <title>Re: How to find out duplicate rows (duplicacy to be checked on basis of 2 attributes ) in csv files using apache nifi ?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185645#M80812</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/86309/rinkya32.html" nodeid="86309" target="_blank"&gt;@Rinky Arora&lt;/A&gt;&lt;P&gt;-&lt;/P&gt;&lt;P&gt;Here is a simple flow that will compare lines of a CSV file and delete any that are duplicates:&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="80576-screen-shot-2018-07-17-at-10702-pm.png" style="width: 956px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18871iF6BE3E6B11018D22/image-size/medium?v=v2&amp;amp;px=400" role="button" title="80576-screen-shot-2018-07-17-at-10702-pm.png" alt="80576-screen-shot-2018-07-17-at-10702-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Template of above attached:&lt;BR /&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/80577-detect-duplicate-lines-in-csv.xml" target="_blank"&gt;detect-duplicate-lines-in-csv.xml&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;If you only want to compare the PAN and TIN CSV values only of each line and not the entire line it gets a bit more complicated.&lt;BR /&gt;You would then need to extract the PAN and TIN Values from the content and use the HashAttribute Processor instead of HashContent.&lt;/P&gt;&lt;P&gt;-&lt;/P&gt;&lt;P&gt;Hope this help get you going.&lt;BR /&gt;-&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Matt&lt;BR /&gt;-&lt;BR /&gt;If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 08:20:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185645#M80812</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2019-08-18T08:20:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to find out duplicate rows (duplicacy to be checked on basis of 2 attributes ) in csv files using apache nifi ?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185646#M80813</link>
      <description>&lt;P&gt;Here is the flow that could be used base d on just looking at PAN and TIN values in each line:&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="80578-screen-shot-2018-07-17-at-12703-pm.png" style="width: 985px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18870i1333472BA3F2BB19/image-size/medium?v=v2&amp;amp;px=400" role="button" title="80578-screen-shot-2018-07-17-at-12703-pm.png" alt="80578-screen-shot-2018-07-17-at-12703-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/80579-detect-duplicate-attr-in-csv.xml" target="_blank"&gt;detect-duplicate-attr-in-csv.xml&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 08:20:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185646#M80813</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2019-08-18T08:20:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to find out duplicate rows (duplicacy to be checked on basis of 2 attributes ) in csv files using apache nifi ?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185647#M80814</link>
      <description>&lt;P&gt;For either of these examples you will need to create a "demarcator" file on disk that contains a new line and then point at that file in teh assocaited config in the mergeContent processors to make sure the merged file has one FlowFile content per line.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jul 2018 00:42:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185647#M80814</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2018-07-18T00:42:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to find out duplicate rows (duplicacy to be checked on basis of 2 attributes ) in csv files using apache nifi ?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185648#M80815</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/525/mclarke.html" nodeid="525"&gt;@Matt Clarke&lt;/A&gt; . This solution worked very well for me. Thanks a lot.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jul 2018 15:16:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185648#M80815</guid>
      <dc:creator>rinkya32</dc:creator>
      <dc:date>2018-07-18T15:16:12Z</dc:date>
    </item>
    <item>
      <title>Re: How to find out duplicate rows (duplicacy to be checked on basis of 2 attributes ) in csv files using apache nifi ?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185649#M80816</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/86309/rinkya32.html" nodeid="86309"&gt;@Rinki&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Please start a new forum question. I am probably not best resource for SQL statements.  Starting a new question will get you faster response.&lt;/P&gt;&lt;P&gt;-&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jul 2018 20:37:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-find-out-duplicate-rows-duplicacy-to-be-checked-on/m-p/185649#M80816</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2018-07-18T20:37:54Z</dc:date>
    </item>
  </channel>
</rss>

