Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

what is the best way to get ftp file to hdfs continusly ?

Rising Star

I want to get ftp file into hdfs,in ftp files are created in date directory for every day, I need to autonmate this job. what will be the best way for doing this?



Hi @Ravikiran Dasari

If it is for knowledge purpose then what Im going to give has no more information then the previous answers. But if you are looking for something related to work then this answer might help a bit.

Have a file watcher which looks for a file with the particular pattern, which has to be ftp'ed to the desired location. Once the file arrives you can move the file to HDFS server. This can be accomplished by a simple shell script which requires basic knowledge on shell and nothing more. Also this can accomplished by either push or pull. If you have any other downstream jobs which has to be executed once the file arrives in hdfs then I would recommend to go with pull approach so that you can execute any other hadoop/hive/pig/spark jobs in hdfs server.

Hope it helps!!

View solution in original post


New Contributor

Hi ravikirandasar1,


I also have the same query.Could you please let me know how did you automate this job using crontab for everyday download of the files to hdfs?