Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

what is the best way to get ftp file to hdfs continusly ?

avatar
Rising Star

I want to get ftp file into hdfs,in ftp files are created in date directory for every day, I need to autonmate this job. what will be the best way for doing this?

1 ACCEPTED SOLUTION

avatar

Hi @Ravikiran Dasari

If it is for knowledge purpose then what Im going to give has no more information then the previous answers. But if you are looking for something related to work then this answer might help a bit.

Have a file watcher which looks for a file with the particular pattern, which has to be ftp'ed to the desired location. Once the file arrives you can move the file to HDFS server. This can be accomplished by a simple shell script which requires basic knowledge on shell and nothing more. Also this can accomplished by either push or pull. If you have any other downstream jobs which has to be executed once the file arrives in hdfs then I would recommend to go with pull approach so that you can execute any other hadoop/hive/pig/spark jobs in hdfs server.

Hope it helps!!

View solution in original post

10 REPLIES 10

avatar
New Contributor

Hi ravikirandasar1,

 

I also have the same query.Could you please let me know how did you automate this job using crontab for everyday download of the files to hdfs?