<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: what is the best way to get ftp file to hdfs continusly ? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/309775#M223914</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN class=""&gt;&lt;A href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/68518" target="_blank" rel="noopener"&gt;ravikirandasar1&lt;/A&gt;,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;I also have the same query.Could you please let me know how did you automate this job using crontab for everyday download of the files to hdfs?&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 15 Jan 2021 07:54:19 GMT</pubDate>
    <dc:creator>Amoli</dc:creator>
    <dc:date>2021-01-15T07:54:19Z</dc:date>
    <item>
      <title>what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229150#M191007</link>
      <description>&lt;P&gt;I want to get ftp file into hdfs,in ftp files are created in date directory for every day, I need to autonmate this job. what will be the best way for doing this?&lt;/P&gt;</description>
      <pubDate>Tue, 20 Feb 2018 19:29:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229150#M191007</guid>
      <dc:creator>ravikirandasar1</dc:creator>
      <dc:date>2018-02-20T19:29:12Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229151#M191008</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/48397/ravikirandasari1.html" nodeid="48397"&gt;@Ravikiran Dasari&lt;/A&gt;&lt;P&gt; Have you tried looking at NiFi and its capabilities. NiFi provides a lot of processors which can help you automate your tasks and to create a flow for performing those tasks. You can create a flow to pickup data from a source and dump it in a different location. You can check the following example written by one of our NiFi experts &lt;A rel="user" href="https://community.cloudera.com/users/525/mclark.html" nodeid="525"&gt;@Matt Clarke&lt;/A&gt; on - How you can use NiFi to pull data from an FTP server. &lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/97773/how-to-retrieve-files-from-a-sftp-server-using-nif.html"&gt;How-to: Retrieve files from a SFTP server using NiFi (GetSFTP vs. ListSFTP)&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Along with the processors mentioned in the article above, you can use PutHDFS processors explained in below docs to dump the data in HDFS. &lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.5.0/org.apache.nifi.processors.hadoop.PutHDFS/index.html"&gt;PutHDFS - NiFi docs&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope this helps&lt;/P&gt;</description>
      <pubDate>Tue, 20 Feb 2018 19:50:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229151#M191008</guid>
      <dc:creator>Schandhok</dc:creator>
      <dc:date>2018-02-20T19:50:59Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229152#M191009</link>
      <description>&lt;P&gt;Are you looking for the scheduling or how to script it? Shall the files be copied to hdfs as soon as they arrive, or in a special frequency, ie. daily, hourly, etc...&lt;/P&gt;&lt;P&gt;The best way depends on the tools and knowledge you have. It could be done with a plain shell script, but also with nifi. Spark has also a FTP connector.&lt;/P&gt;&lt;P&gt;Here is a post on how to solve it with nifi: &lt;A href="https://community.hortonworks.com/questions/70261/how-to-read-data-from-a-file-from-remote-ftp-serve.html" target="_blank"&gt;https://community.hortonworks.com/questions/70261/how-to-read-data-from-a-file-from-remote-ftp-serve.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Feb 2018 19:51:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229152#M191009</guid>
      <dc:creator>arald</dc:creator>
      <dc:date>2018-02-20T19:51:51Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229153#M191010</link>
      <description>&lt;P&gt;posted my answer in parallel, without noticing yours. sorry for the redundant info.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Feb 2018 19:53:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229153#M191010</guid>
      <dc:creator>arald</dc:creator>
      <dc:date>2018-02-20T19:53:40Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229154#M191011</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/48397/ravikirandasari1.html" nodeid="48397"&gt;@Ravikiran Dasari&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If it is for knowledge purpose then what Im going to give has no more information then the previous answers. But if you are looking for something related to work then this answer might help a bit.&lt;/P&gt;&lt;P&gt;Have a file watcher which looks for a file with the particular pattern,  which has to be ftp'ed to the desired location. Once the file arrives you can move the file to HDFS server. This can be accomplished by a simple shell script which requires basic knowledge on shell and nothing more.  Also this can accomplished by either push or pull. If you have any other downstream jobs which has to be executed once the file arrives in hdfs then I would recommend to go with pull approach so that you can execute any other hadoop/hive/pig/spark jobs in hdfs server.&lt;/P&gt;&lt;P&gt;Hope it helps!!&lt;/P&gt;</description>
      <pubDate>Tue, 20 Feb 2018 20:33:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229154#M191011</guid>
      <dc:creator>balavignesh_nag</dc:creator>
      <dc:date>2018-02-20T20:33:24Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229155#M191012</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/20278/haraldberghoff.html" nodeid="20278"&gt;@Harald Berghoff&lt;/A&gt;, Thanks for solution, In my cluster NiFi is not there. So I have to do using shell only other wise I can go for Flume, in case of shell how to do that? manual interaction I am getting files, bu I want to automate this, generally my manula process is like follows&lt;/P&gt;&lt;P&gt;step1:&lt;/P&gt;&lt;P&gt;sftp &lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;ayosftpuser@IPaddredss&lt;BR /&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;password&lt;/P&gt;&lt;P&gt;step2:&lt;/P&gt;&lt;P&gt;cd /sourcedir&lt;/P&gt;&lt;P&gt;step3:in above directory every day one directory will create, in this directory some files are droping.&lt;/P&gt;&lt;P&gt;get -Pr 2018-02-26&lt;/P&gt;&lt;P&gt;bye&lt;/P&gt;&lt;P&gt;step4:&lt;/P&gt;&lt;P&gt;hadoop fs -put -f 2018-02-26 /destination&lt;/P&gt;&lt;P&gt;I need to automate this &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 15:05:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229155#M191012</guid>
      <dc:creator>ravikirandasar1</dc:creator>
      <dc:date>2018-02-27T15:05:03Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229156#M191013</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/12437/balavigneshnagamuthuvenkatesan.html" nodeid="12437"&gt;@Bala Vignesh N V&lt;BR /&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks for solution,&lt;/P&gt;&lt;P&gt;I need to implement,in case of shell how to do that? manual 
interaction I am getting files, bu I want to automate this, generally my
 manula process is like follows&lt;/P&gt;&lt;P&gt;step1:&lt;/P&gt;&lt;P&gt;sftp &lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;ayosftpuser@IPaddredss&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;password&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;step2:&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;cd /sourcedir&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;step3:in above directory every day one directory will create, in this directory some files are droping.&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;get -Pr 2018-02-26&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;bye&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;step4:&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;hadoop fs -put -f 2018-02-26 /destination&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="mailto:ayosftpuser@10.151.28.151"&gt;I need to automate this &lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;A rel="user" href="https://community.cloudera.com/users/12437/balavigneshnagamuthuvenkatesan.html" nodeid="12437"&gt;&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 15:48:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229156#M191013</guid>
      <dc:creator>ravikirandasar1</dc:creator>
      <dc:date>2018-02-27T15:48:32Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229157#M191014</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/48397/ravikirandasari1.html" nodeid="48397"&gt;@Ravikiran Dasari&lt;/A&gt;: do you have any experience in shell scripts? Do you know how to write a bash script, or maybe a Python script?&lt;BR /&gt;Do you have a special scheduler that you use in your environment, or will you use cron? If you don't know, I guess it will become cron. Try if you are able to edit the crontab via entering this command on the shell&lt;/P&gt;&lt;PRE&gt;crontab -e&lt;/PRE&gt;&lt;P&gt;Either you get a list of cron jobs or an error message like 'You (&amp;lt;&amp;lt;userid&amp;gt;&amp;gt;) are not allowed to use this program (crontab)'&lt;/P&gt;Now when you want to write a shell script the starting point is a simple text files, containing the commands you otherwise enter on the shell. The script file should start with a line aka 'shebang' providing the script interpreter to be used. I.e. on RedHat&lt;BR /&gt;&lt;UL&gt;&lt;LI&gt;bash: #!/bin/bash&lt;/LI&gt;&lt;LI&gt;python: #!/usr/bin/python &lt;/LI&gt;&lt;LI&gt;perl:  #!usr/bin/perl&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If you decide to go for bash script, just create a file like this (you can use a different editor if you like):&lt;/P&gt;&lt;PRE&gt;vi ~/mycopyscript&lt;/PRE&gt;&lt;P&gt;enter in that script all your command&lt;/P&gt;&lt;PRE&gt;#!/bin/bash&lt;BR /&gt;&lt;BR /&gt;dir = `date  +%Y-%m-%d`&lt;BR /&gt;sftp ayosftpuser@IPaddredss &amp;lt;&amp;lt; __MY_FTP_COMMANDS__&lt;BR /&gt;password&lt;BR /&gt;cd /sourcedir&lt;BR /&gt;get -Pr ${dir}&lt;BR /&gt;bye&lt;BR /&gt;__MY_FTP_COMMANDS__

#at this point the files should already be locally copied
hadoop fs -put -f ${dir} /destination
&lt;/PRE&gt;&lt;P&gt;Save the script ( by entering &amp;lt;ESC&amp;gt;:wq in vi) Next make the script executable, and only allow access from the owner (you)&lt;/P&gt;&lt;PRE&gt;chmod 700 ~/mycopyscript&lt;/PRE&gt;&lt;P&gt;You should be able to execute it now:&lt;/P&gt;&lt;PRE&gt;~/mycopyscript&lt;/PRE&gt;&lt;P&gt;This script is just a starting point, and done plain simple, no error handling and no security, whoever reads the script also has access to the password, and no parameter (you must execute it at the date that the dir uses)&lt;/P&gt;&lt;P&gt;Still it should provide you with the basic idea of a shell script. &lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 16:51:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229157#M191014</guid>
      <dc:creator>arald</dc:creator>
      <dc:date>2018-02-27T16:51:08Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229158#M191015</link>
      <description>&lt;P&gt;HI &lt;A rel="user" href="https://community.cloudera.com/users/20278/haraldberghoff.html" nodeid="20278"&gt;@Harald Berghoff&lt;BR /&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Im usinf crontab only for scheduling the jobs, I tried in ur way also, but its prompting for password, how to give password in different script and error handling?If you dont mind can I have well capable script for handling errors and security.&lt;BR /&gt;&lt;A rel="user" href="https://community.cloudera.com/users/20278/haraldberghoff.html" nodeid="20278"&gt;&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 19:44:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229158#M191015</guid>
      <dc:creator>ravikirandasar1</dc:creator>
      <dc:date>2018-02-27T19:44:10Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229159#M191016</link>
      <description>&lt;P&gt;ok, the error handling can be implemented in this way:&lt;/P&gt;&lt;PRE&gt;...
__MY_FTP_COMMANDS__

ret_ftp = $?
if [ ${ret_ftp} == 0]
  then&lt;BR /&gt;    #if you have a logging facility you properly want to use it to log the status
    echo "Files successfully transfered"
  else
    echo "Error in file transfer"
    return ${ret_ftp}
fi

#at this point the files should already be locally copied
hadoop fs -put -f ${dir} /destination


ret_hdfs = $?
#Put a similar handling here
&lt;/PRE&gt;&lt;P&gt;For the password, SFTP is like ssh a little tricky, so to get rid of the password prompt, I would recommend to exchange SSH keys&lt;BR /&gt;If this is working you can add the scheduled execution of the script in your crontab.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 20:52:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/229159#M191016</guid>
      <dc:creator>arald</dc:creator>
      <dc:date>2018-02-27T20:52:05Z</dc:date>
    </item>
    <item>
      <title>Re: what is the best way to get ftp file to hdfs continusly ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/309775#M223914</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN class=""&gt;&lt;A href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/68518" target="_blank" rel="noopener"&gt;ravikirandasar1&lt;/A&gt;,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;I also have the same query.Could you please let me know how did you automate this job using crontab for everyday download of the files to hdfs?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jan 2021 07:54:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/what-is-the-best-way-to-get-ftp-file-to-hdfs-continusly/m-p/309775#M223914</guid>
      <dc:creator>Amoli</dc:creator>
      <dc:date>2021-01-15T07:54:19Z</dc:date>
    </item>
  </channel>
</rss>

