<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to read data from a file from Remote FTP Server and load the data into Hadoop using NIFI? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-read-data-from-a-file-from-Remote-FTP-Server-and-load/m-p/140923#M48163</link>
    <description>&lt;P&gt;This is very straightforward with NiFi -- very common use case.&lt;/P&gt;&lt;P&gt;If the new data is in entire files, using GetFTP (or GetSFTP) processor and configure ftp host and port, path, regex of filename(s), polling frequency, whether to delete original (you can always archive it by forking to another processor), etc.  Very easy to configure and implement, monitor, etc.&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GetSFTP/" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GetSFTP/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If the new data are new lines in files (like log files) similar to above but use TailFile which will pick up new lines since last polling. &lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.TailFile/" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.TailFile/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;On the put side, PutHDFS processor.  You download core-site.xml and hdfs.xml from your cluster, put it in a filepath on your nifi cluster and reference that path in the processor config.  With that, you then configure the hdfs path (xmls hold all connection details) to put the file ... maybe append a unique timestamp or uuid to filename to distinguish repeated ingests of identically named files.&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.PutHDFS/" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.PutHDFS/&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 07 Dec 2016 19:56:54 GMT</pubDate>
    <dc:creator>gkeys</dc:creator>
    <dc:date>2016-12-07T19:56:54Z</dc:date>
    <item>
      <title>How to read data from a file from Remote FTP Server and load the data into Hadoop using NIFI?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-read-data-from-a-file-from-Remote-FTP-Server-and-load/m-p/140922#M48162</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I want to load the &lt;STRONG&gt;Real Time Data&lt;/STRONG&gt; (Text File) containing &lt;STRONG&gt;incremental data from FTP Server to hadoop&lt;/STRONG&gt;. I tried &lt;STRONG&gt;Flume&lt;/STRONG&gt; but i am getting &lt;STRONG&gt;File Not Found Exception &lt;/STRONG&gt;and i am planning to &lt;STRONG&gt;use NIFI&lt;/STRONG&gt; to &lt;STRONG&gt;load the data from FTP Server to Hadoop&lt;/STRONG&gt;. Does anyone tried loading the data from single File in FTP Server to Hadoop. Please do the needful.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Dec 2016 12:56:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-read-data-from-a-file-from-Remote-FTP-Server-and-load/m-p/140922#M48162</guid>
      <dc:creator>r_mageshkumar</dc:creator>
      <dc:date>2016-12-07T12:56:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to read data from a file from Remote FTP Server and load the data into Hadoop using NIFI?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-read-data-from-a-file-from-Remote-FTP-Server-and-load/m-p/140923#M48163</link>
      <description>&lt;P&gt;This is very straightforward with NiFi -- very common use case.&lt;/P&gt;&lt;P&gt;If the new data is in entire files, using GetFTP (or GetSFTP) processor and configure ftp host and port, path, regex of filename(s), polling frequency, whether to delete original (you can always archive it by forking to another processor), etc.  Very easy to configure and implement, monitor, etc.&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GetSFTP/" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.GetSFTP/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If the new data are new lines in files (like log files) similar to above but use TailFile which will pick up new lines since last polling. &lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.TailFile/" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.TailFile/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;On the put side, PutHDFS processor.  You download core-site.xml and hdfs.xml from your cluster, put it in a filepath on your nifi cluster and reference that path in the processor config.  With that, you then configure the hdfs path (xmls hold all connection details) to put the file ... maybe append a unique timestamp or uuid to filename to distinguish repeated ingests of identically named files.&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.PutHDFS/" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.PutHDFS/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Dec 2016 19:56:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-read-data-from-a-file-from-Remote-FTP-Server-and-load/m-p/140923#M48163</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-12-07T19:56:54Z</dc:date>
    </item>
  </channel>
</rss>

