I have a folder in HDFS that will have files coming in everyday. I want to duplicate the folder in such a way that whenever a new file comes to the original folder, I want that to be duplicated/synced in the duplicate folder.
Basically, I want to sync a folder with another in HDFS
There is just not scheduler built into distcp. So you have set up all your options and run the distcp job and it will run that once. So you need something else to tell it to run on scheduler. Cron is my default but this could be oozie as well or anything else that can schedule jobs.