<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to know if files are copied to hdfs? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110304#M46612</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/641/mburgess.html" nodeid="641"&gt;@Matt Burgess&lt;/A&gt; Yes, using "success" relationship I would only know if current (single) flowfile has been wirtten successfully onto hdfs.. how would I know if all my files are finished processing exactly once?&lt;/P&gt;</description>
    <pubDate>Sat, 19 Nov 2016 17:22:55 GMT</pubDate>
    <dc:creator>karthikprasad44</dc:creator>
    <dc:date>2016-11-19T17:22:55Z</dc:date>
    <item>
      <title>How to know if files are copied to hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110301#M46609</link>
      <description>&lt;P&gt;I am trying to copy files from my local machine to a remote hdfs. I am using GetFile -&amp;gt; PutHDFS processors.&lt;/P&gt;&lt;P&gt;My exact usecase is:&lt;/P&gt;&lt;P&gt;- &lt;STRONG&gt;I want to know as soon as the copy is done (Currently I am using rest api to track bytes tranferred to know this)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;- Copy just once&lt;/P&gt;&lt;P&gt;- Keep the source files&lt;/P&gt;&lt;P&gt;Problems I am getting:&lt;/P&gt;&lt;P&gt;- If I configure for keeping the sources files and scheduling time to 0 secs, GetFile processor is creating flowfiles again and again for same files&lt;/P&gt;&lt;P&gt;- I dont think I should configure scheduling time to large value as each task processes only one file and waits for next schedule&lt;/P&gt;&lt;P&gt;Please help. &lt;/P&gt;&lt;P&gt;Open to try other approaches,&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Nov 2016 15:48:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110301#M46609</guid>
      <dc:creator>karthikprasad44</dc:creator>
      <dc:date>2016-11-18T15:48:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to know if files are copied to hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110302#M46610</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/13834/karthikprasad4444.html" nodeid="13834"&gt;@Karthik Manchala&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;To achieve what you are looking for, I'd replace the GetFile processor by the combination of ListFile and FetchFile processors. The first one will list files according to your conditions and will emit an empty flow files for each listed file with an attribute containing the path of the file to retrieve. The second one will actually fetch the content of the file for the given path. The first processor has a "state" and will keep information regarding already processed files so that it won't consume the same file multiple times. Besides, this approach is also recommended to allow a better load distribution when you have a NiFi cluster.&lt;/P&gt;&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Nov 2016 16:29:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110302#M46610</guid>
      <dc:creator>pvillard</dc:creator>
      <dc:date>2016-11-18T16:29:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to know if files are copied to hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110303#M46611</link>
      <description>&lt;P&gt;In addition to &lt;A rel="user" href="https://community.cloudera.com/users/5078/pvillard.html" nodeid="5078"&gt;@Pierre Villard&lt;/A&gt; 's suggestion, &lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.PutHDFS/index.html"&gt;PutHDFS&lt;/A&gt; transfers flow files that have been successfully written to HDFS to the "success" relationship, so you can put a processor downstream from PutHDFS (along the "success" relationship", and at that point you can be sure that the file has been successfully written to HDFS, and can proceed accordingly.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Nov 2016 20:10:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110303#M46611</guid>
      <dc:creator>mburgess</dc:creator>
      <dc:date>2016-11-18T20:10:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to know if files are copied to hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110304#M46612</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/641/mburgess.html" nodeid="641"&gt;@Matt Burgess&lt;/A&gt; Yes, using "success" relationship I would only know if current (single) flowfile has been wirtten successfully onto hdfs.. how would I know if all my files are finished processing exactly once?&lt;/P&gt;</description>
      <pubDate>Sat, 19 Nov 2016 17:22:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110304#M46612</guid>
      <dc:creator>karthikprasad44</dc:creator>
      <dc:date>2016-11-19T17:22:55Z</dc:date>
    </item>
    <item>
      <title>Re: How to know if files are copied to hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110305#M46613</link>
      <description>&lt;DIV&gt;&lt;A rel="user" href="https://community.cloudera.com/users/641/mburgess.html" nodeid="641"&gt;@Matt Burgess&lt;/A&gt; Did you find any solution to check whether all files are copied successfully?&lt;BR /&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 21 May 2018 15:46:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-know-if-files-are-copied-to-hdfs/m-p/110305#M46613</guid>
      <dc:creator>vipul_bhardwaj</dc:creator>
      <dc:date>2018-05-21T15:46:19Z</dc:date>
    </item>
  </channel>
</rss>

