I'm trying to run DistCp first run, by creating snapshot S0 in the source and DistCp this S0 to the backup cluster, but since the DistCp'ed folder contain more than 3,000,000 files and 70 T, the running DistCp log is flooding the application master local file system, Is there a way to solve this, as a work around i'm thinking to DistCp the subfolder separetly, then creating the S0 snapshot in the source and distCped it. Any other smart ideas?
I belive there is no way you could supress the logs because DistCp keeps logs of each file it attempts to copy as map output
so no way we can save the log at the HDFS instead the local file system for the application master?
Do you think of any other work around for this?