<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question MiNiFi is sending duplicate files to NiFi in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/MiNiFi-is-sending-duplicate-files-to-NiFi/m-p/302761#M221237</link>
    <description>&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;I am using MiNiFi(0.5.0) to pick and transfer files from a Linux machine to NiFi(1.9.1). In few cases i can see duplicate files being transferred to NiFi.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;Flow is setup as below&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;GetFile -&amp;gt;LogAttribute -&amp;gt; PutFile(archive) -&amp;gt; RemoteProcessGroup&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;Log 1:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="book antiqua,palatino,serif" size="2" color="#0000FF"&gt;minifi-app.log:2020-09-13 06:09:00,760 INFO [Timer-Driven Process Thread-4] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=Input_Port_GES,targets=&lt;A href="https://nifi1.myorg.com:7071/nifi" target="_blank" rel="noopener"&gt;https://nifi1.myorg.com:7071/nifi&lt;/A&gt;] Successfully sent [StandardFlowFileRecord[uuid=345d9b6d-e9f7-4dd8-ad9a-a9d66fdfd902,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1599937413340-1, container=default, section=1], offset=1073154, length=237],offset=0,name=&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;RandomFile1154.txt,&lt;/STRONG&gt;&lt;/FONT&gt;size=237]] (237 bytes) to nifi://nifi1.myorg.com:7074 in 48 milliseconds at a rate of 4.74 KB/sec&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;Log 2 :&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2" color="#0000FF"&gt;minifi-app.log:2020-09-13 06:09:01,910 INFO [Timer-Driven Process Thread-5] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=Input_Port_GES,targets=&lt;A href="https://nifi1.myorg.com:7071/nifi" target="_blank" rel="noopener"&gt;https://nifi1.myorg.com:7071/nifi&lt;/A&gt;] Successfully sent [StandardFlowFileRecord[uuid=f74eb941-a233-4f9e-86ff-07723940f012,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1599937413340-1, container=default, section=1], offset=1109014, length=237],offset=0,name=&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;RandomFile1154.txt,&lt;/STRONG&gt;&lt;/FONT&gt;size=237], &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#0000FF"&gt;StandardFlowFileRecord[uuid=522a4350-4cab-476c-a087-a3793101412e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1599937413340-1, container=default, section=1], offset=1074338, length=235],offset=0,name=&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;RandomFile1346.txt,&lt;/STRONG&gt;&lt;/FONT&gt;size=235]] (472 bytes) to nifi://nifi1.myorg.com:7074 in 30 milliseconds at a rate of 14.9 KB/sec&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;In the above log its seen that RandomFile1154.txt&amp;nbsp;file is transferred once at&amp;nbsp;2020-09-13 06:09:00,760&amp;nbsp;and then again at&amp;nbsp;2020-09-13 06:09:01,910&amp;nbsp;along with&amp;nbsp;RandomFile1346.txt.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;I went through the&amp;nbsp;StandardRemoteGroupPort code and i can see that once the transfer is successful, session is committed and it should not be available for next transfer.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;I have added the logs to see if my GetFile picked the file twice, but this is not the case, the log printed only once.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;Please share your thoughts on this&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 13 Sep 2020 12:22:14 GMT</pubDate>
    <dc:creator>Umakanth</dc:creator>
    <dc:date>2020-09-13T12:22:14Z</dc:date>
    <item>
      <title>MiNiFi is sending duplicate files to NiFi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/MiNiFi-is-sending-duplicate-files-to-NiFi/m-p/302761#M221237</link>
      <description>&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;I am using MiNiFi(0.5.0) to pick and transfer files from a Linux machine to NiFi(1.9.1). In few cases i can see duplicate files being transferred to NiFi.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;Flow is setup as below&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;GetFile -&amp;gt;LogAttribute -&amp;gt; PutFile(archive) -&amp;gt; RemoteProcessGroup&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;Log 1:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="book antiqua,palatino,serif" size="2" color="#0000FF"&gt;minifi-app.log:2020-09-13 06:09:00,760 INFO [Timer-Driven Process Thread-4] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=Input_Port_GES,targets=&lt;A href="https://nifi1.myorg.com:7071/nifi" target="_blank" rel="noopener"&gt;https://nifi1.myorg.com:7071/nifi&lt;/A&gt;] Successfully sent [StandardFlowFileRecord[uuid=345d9b6d-e9f7-4dd8-ad9a-a9d66fdfd902,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1599937413340-1, container=default, section=1], offset=1073154, length=237],offset=0,name=&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;RandomFile1154.txt,&lt;/STRONG&gt;&lt;/FONT&gt;size=237]] (237 bytes) to nifi://nifi1.myorg.com:7074 in 48 milliseconds at a rate of 4.74 KB/sec&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;Log 2 :&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2" color="#0000FF"&gt;minifi-app.log:2020-09-13 06:09:01,910 INFO [Timer-Driven Process Thread-5] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=Input_Port_GES,targets=&lt;A href="https://nifi1.myorg.com:7071/nifi" target="_blank" rel="noopener"&gt;https://nifi1.myorg.com:7071/nifi&lt;/A&gt;] Successfully sent [StandardFlowFileRecord[uuid=f74eb941-a233-4f9e-86ff-07723940f012,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1599937413340-1, container=default, section=1], offset=1109014, length=237],offset=0,name=&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;RandomFile1154.txt,&lt;/STRONG&gt;&lt;/FONT&gt;size=237], &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#0000FF"&gt;StandardFlowFileRecord[uuid=522a4350-4cab-476c-a087-a3793101412e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1599937413340-1, container=default, section=1], offset=1074338, length=235],offset=0,name=&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;RandomFile1346.txt,&lt;/STRONG&gt;&lt;/FONT&gt;size=235]] (472 bytes) to nifi://nifi1.myorg.com:7074 in 30 milliseconds at a rate of 14.9 KB/sec&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;In the above log its seen that RandomFile1154.txt&amp;nbsp;file is transferred once at&amp;nbsp;2020-09-13 06:09:00,760&amp;nbsp;and then again at&amp;nbsp;2020-09-13 06:09:01,910&amp;nbsp;along with&amp;nbsp;RandomFile1346.txt.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;I went through the&amp;nbsp;StandardRemoteGroupPort code and i can see that once the transfer is successful, session is committed and it should not be available for next transfer.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;I have added the logs to see if my GetFile picked the file twice, but this is not the case, the log printed only once.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="book antiqua,palatino,serif"&gt;Please share your thoughts on this&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 13 Sep 2020 12:22:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/MiNiFi-is-sending-duplicate-files-to-NiFi/m-p/302761#M221237</guid>
      <dc:creator>Umakanth</dc:creator>
      <dc:date>2020-09-13T12:22:14Z</dc:date>
    </item>
    <item>
      <title>Re: MiNiFi is sending duplicate files to NiFi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/MiNiFi-is-sending-duplicate-files-to-NiFi/m-p/302804#M221260</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/74188"&gt;@Umakanth&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;From your shared log lines we can see two things:&lt;BR /&gt;&lt;BR /&gt;1. "LOG 1" shows "&lt;SPAN&gt;StandardFlowFileRecord[&lt;STRONG&gt;uuid=345d9b6d-e9f7-4dd8-ad9a-a9d66fdfd902&lt;/STRONG&gt;" and "LOG 2" shows "Successfully sent [StandardFlowFileRecord[uuid=&lt;STRONG&gt;f74eb941-a233-4f9e-86ff-07723940f012&lt;/STRONG&gt;". This tells us these "&lt;STRONG&gt;RandomFile1154.txt&lt;/STRONG&gt;" are two different FlowFiles. So does not look like RPG sent the same FlowFile&amp;nbsp;twice, but rather sent two FlowFiles with each referencing the same content.&amp;nbsp; I am not sure how you have your LogAttribute processor configured, but you should look for the log output produced by these two uuids to learn more about these two FlowFiles.&amp;nbsp; I suspect from your comments you will&amp;nbsp;only find one of these passed through your LogAttribute processor.&lt;BR /&gt;&lt;BR /&gt;2. We can see from both logs that the above two FlowFiles actually point at the exact same content in the content_repository:&amp;nbsp;&amp;nbsp;&lt;BR /&gt;"LOG 1" --&amp;gt; claim=StandardContentClaim [resourceClaim=StandardResourceClaim[&lt;STRONG&gt;id=1599937413340-1, container=default, section=1], offset=1073154, length=237&lt;/STRONG&gt;],offset=0,name=&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;RandomFile1154.txt,&lt;/STRONG&gt;&lt;/FONT&gt;size=237]&lt;BR /&gt;"LOG 2" --&amp;gt;&amp;nbsp;claim=StandardContentClaim [resourceClaim=StandardResourceClaim[&lt;STRONG&gt;id=1599937413340-1, container=default, section=1], offset=1109014, length=237&lt;/STRONG&gt;],offset=0,name=&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;RandomFile1154.txt,&lt;/STRONG&gt;&lt;/FONT&gt;size=237]&lt;BR /&gt;&lt;BR /&gt;This typically happens when a FlowFile becomes cloned somewhere in your dataflow.&amp;nbsp; For example: when a relationship from a processor is defined twice.&lt;BR /&gt;&lt;BR /&gt;Since you saw that GetFile only ingested file once, that rules out GetFile as the source of this duplication.&amp;nbsp; But had it been GetFile, you would&amp;nbsp;have not seen identical claim information.&amp;nbsp; LogAttribute only has a single "success" relationship, so if you had drawn two connections with "Success" relationship defined in both, you would have&amp;nbsp;seen duplicates of every ingested content.&amp;nbsp; So this seems unlikely as well.&amp;nbsp; Next you have your PutFile processor.&amp;nbsp; This processor has both "success" and "failure" relationships.&amp;nbsp; I suspect the "success" relationship is assigned to the connection going to your Remote Process Group" and the "failure" relationship assigned to a connection that loops back on the PutFile itself(?).&amp;nbsp; Now if you had accidentally&amp;nbsp;drawn the "failure" connection twice (one may be stack on top of the other), anytime a FlowFile failed in the&amp;nbsp;putFile it would&amp;nbsp;have been routed to one failure connection and cloned to other failure connection.&amp;nbsp; Then on time they both processed successfully by putFile and you end up with the original and clone&amp;nbsp;sent to your RPG.&lt;BR /&gt;&lt;BR /&gt;Hope this helps,&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Sep 2020 16:20:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/MiNiFi-is-sending-duplicate-files-to-NiFi/m-p/302804#M221260</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2020-09-14T16:20:19Z</dc:date>
    </item>
  </channel>
</rss>

