We're migrating some data from production cluster to development cluster, by running distcp from the development cluster. We plan to set-up a script on the dev cluster that run distcp throughout the day that copy certain archive directory in the production cluster. The production cluster is running operations throughout the day, and we don't want to interrupt it. So is it safe to run distcp from dev cluster to migrate data from production cluster when it's still running operational job?
The condition is this archive directory we're trying to copy is not being accessed by any operational job. it's just a passive directory exclusively for storage purpose.
I guess that if you don't throttle the distCp jobs, yes it could affect the performance.
But luckily, you can throttle the distCp command (by specifying the number of concurrent map & the bandwidth available for each map).
Check the documentation.