<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Get duplicate records in MySql in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Get-duplicate-records-in-MySql/m-p/389203#M246913</link>
    <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;for your quick response.&lt;BR /&gt;&lt;BR /&gt;But I have another "&lt;STRONG&gt;PutDatabaseRecord&lt;/STRONG&gt;" in work flow referring database table called "&lt;STRONG&gt;Transaction&lt;/STRONG&gt;". It includes Primary key called "&lt;STRONG&gt;transaction_id&lt;/STRONG&gt;" without auto incrementing&amp;nbsp; &amp;nbsp;and "&lt;STRONG&gt;transaction_id&lt;/STRONG&gt;" is the value of "&lt;STRONG&gt;Update key&lt;/STRONG&gt;" in "&lt;STRONG&gt;PutDatabaseRecord&lt;/STRONG&gt;".&lt;BR /&gt;In here, update and insert process are working properly. What is the reason for difference.&amp;nbsp; &amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 14 Jun 2024 04:52:35 GMT</pubDate>
    <dc:creator>Thar11027</dc:creator>
    <dc:date>2024-06-14T04:52:35Z</dc:date>
    <item>
      <title>Get duplicate records in MySql</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-duplicate-records-in-MySql/m-p/389151#M246904</link>
      <description>&lt;P&gt;I'm using "&lt;STRONG&gt;PutDatabaseRecord (NiFi 1.25.0)&lt;/STRONG&gt;" for data insert/update purposes in MySql.&lt;BR /&gt;Imagine,&amp;nbsp;&lt;BR /&gt;I have a table called "Customer" which has running primary key called "&lt;STRONG&gt;Id&lt;/STRONG&gt;" and the another field call "&lt;STRONG&gt;id_from_core&lt;/STRONG&gt;". The only mentioned "&lt;STRONG&gt;Id&lt;/STRONG&gt;" field as &lt;STRONG&gt;auto generated primary key&lt;/STRONG&gt; in "&lt;STRONG&gt;Customer&lt;/STRONG&gt;" table.&amp;nbsp;&lt;BR /&gt;But when data insert through the NiFi flow, I mentioned "&lt;STRONG&gt;id_from_core&lt;/STRONG&gt;" field as "&lt;STRONG&gt;Update Key&lt;/STRONG&gt;" property in "&lt;STRONG&gt;PutDatabaseRecord&lt;/STRONG&gt;".&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Thar11027_0-1718272973115.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/40970i4784802F83971A40/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Thar11027_0-1718272973115.png" alt="Thar11027_0-1718272973115.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Here is the "&lt;STRONG&gt;Customer&lt;/STRONG&gt;" table,&lt;BR /&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;CREATE TABLE `Customer` (&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;`id` bigint(20) NOT NULL AUTO_INCREMENT,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;`created_by` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;`created_date` datetime(0) NULL DEFAULT NULL,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;`record_status` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;`surname` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;`&lt;STRONG&gt;id_from_core&lt;/STRONG&gt;` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;PRIMARY KEY (`id`) USING BTREE&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;) ENGINE = InnoDB AUTO_INCREMENT = 175 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;&lt;BR /&gt;&lt;BR /&gt;&lt;/FONT&gt;My problem is,&amp;nbsp; update action is not work properly, while inserting same&amp;nbsp; values in "&lt;STRONG&gt;id_from_core" &lt;/STRONG&gt;field data, it should be a update, but it's not updated. Insert as a new record. Could you please anyone know the reason for that matter and what is the solution for that.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2024 10:11:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-duplicate-records-in-MySql/m-p/389151#M246904</guid>
      <dc:creator>Thar11027</dc:creator>
      <dc:date>2024-06-13T10:11:27Z</dc:date>
    </item>
    <item>
      <title>Re: Get duplicate records in MySql</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-duplicate-records-in-MySql/m-p/389155#M246905</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/110964"&gt;@Thar11027&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;I dont think there is an UPSERT statement in MySQL if Im not wrong. I think its treating it as regular insert and hence you are seeing duplicate entries. If you want to use PUTDatabaseRecord processor then you have to create two: one for insert and another for update and to decide which one you need to run you have to do Lookup to see if the customer with the same core id exists or not. For that you can use Lookup Record&amp;nbsp; (refer to : &lt;A href="https://community.cloudera.com/t5/Community-Articles/Data-flow-enrichment-with-NiFi-part-1-LookupRecord-processor/ta-p/246940" target="_blank"&gt;https://community.cloudera.com/t5/Community-Articles/Data-flow-enrichment-with-NiFi-part-1-LookupRecord-processor/ta-p/246940&lt;/A&gt; )processor&amp;nbsp; to enrich your data with the customer core Id if exists, then you check if the record is found (meaning id exist) you route to Update otherwise you route to Insert.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope that helps. If it does please accept solution.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2024 11:43:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-duplicate-records-in-MySql/m-p/389155#M246905</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2024-06-13T11:43:31Z</dc:date>
    </item>
    <item>
      <title>Re: Get duplicate records in MySql</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-duplicate-records-in-MySql/m-p/389203#M246913</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;for your quick response.&lt;BR /&gt;&lt;BR /&gt;But I have another "&lt;STRONG&gt;PutDatabaseRecord&lt;/STRONG&gt;" in work flow referring database table called "&lt;STRONG&gt;Transaction&lt;/STRONG&gt;". It includes Primary key called "&lt;STRONG&gt;transaction_id&lt;/STRONG&gt;" without auto incrementing&amp;nbsp; &amp;nbsp;and "&lt;STRONG&gt;transaction_id&lt;/STRONG&gt;" is the value of "&lt;STRONG&gt;Update key&lt;/STRONG&gt;" in "&lt;STRONG&gt;PutDatabaseRecord&lt;/STRONG&gt;".&lt;BR /&gt;In here, update and insert process are working properly. What is the reason for difference.&amp;nbsp; &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jun 2024 04:52:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-duplicate-records-in-MySql/m-p/389203#M246913</guid>
      <dc:creator>Thar11027</dc:creator>
      <dc:date>2024-06-14T04:52:35Z</dc:date>
    </item>
    <item>
      <title>Re: Get duplicate records in MySql</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-duplicate-records-in-MySql/m-p/389212#M246919</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/110964"&gt;@Thar11027&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I stand corrected. Well, lets be more specific and you can't get more specific than looking the code itself in github :). It turns out the &lt;A href="https://github.com/apache/nifi/blob/main/nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java" target="_self"&gt;PutDatabaseRecord&lt;/A&gt; uses a DatabaseAdapter which is an interface type that gets implement by each Database Engine Type and passed through the DB service associated with this processor (DBCPConnectionPool). Those adapters are responsible for generating the SQL for each statement type (insert, update, delete....). For MySql there is an adapter called &lt;A href="http://nifi-extension-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/db/impl/MySQLDatabaseAdapter.java" target="_self"&gt;MySQLDatabaseAdapter&lt;/A&gt; and if you look at the genereateUpsertStatement method you will find that it uses the following syntax:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;StringBuilder statementStringBuilder = new StringBuilder("INSERT INTO ")
                .append(table)
                .append("(").append(columns).append(")")
                .append(" VALUES ")
                .append("(").append(parameterizedInsertValues).append(")")
                .append(" ON DUPLICATE KEY UPDATE ")
                .append(parameterizedUpdateValues);
        return statementStringBuilder.toString();&lt;/LI-CODE&gt;&lt;P&gt;Notice the use of "ON DUPLICATE KEY UPDATE" syntax. If you look for what that means in MySQL (&lt;A href="https://blog.devart.com/mysql-upsert.html" target="_blank"&gt;https://blog.devart.com/mysql-upsert.html&lt;/A&gt; )&amp;nbsp; you will find that yes it will check if the record key exists or not , and if it does then it will do an update state &lt;U&gt;&lt;STRONG&gt;however that only works on table Primary Key&lt;/STRONG&gt;&lt;/U&gt;. In your case for the &lt;STRONG&gt;Transaction&lt;/STRONG&gt; tale it works because as you mentioned the &lt;STRONG&gt;transaction_id&lt;/STRONG&gt; is the primary key and you probably passing this column as part of the record data, however for the other table the id set to auto increment and probably you are not passing it as part of the record and instead relying on none primary key id_from_core. Not sure if its possible to change your table where this column is your primary key, otherwise you will find yourself having to do lookup to find if it exists or not and may be get the id then do your upsert with the id but Im not sure how this will work with Auto Increment being set. Another option which I tend to do in my case to avoid&amp;nbsp; adding more processors\control services is to create stored proc that will defer all that checking for update or insert to sql then use PutSQL processor to execute the stored proc passing all columns to it but this can be cumbersome if you have so many columns which seem to be your case. What you can do to avoid passing each column is pass record as json string and do json parsing to find the column values in mySQL.&lt;/P&gt;&lt;P&gt;Hope that helps.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jun 2024 09:12:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-duplicate-records-in-MySql/m-p/389212#M246919</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2024-06-14T09:12:32Z</dc:date>
    </item>
  </channel>
</rss>

