Created on
02-10-2020
12:50 PM
- last edited on
02-10-2020
02:52 PM
by
ask_bill_brooks
I have scheduled a replication via Cloudera Manager to replicate one of the hdfs data directory from one datacenter to another datacenter. It is working as expected, but we noticed one thing different. When one of the user run the spark coalesce command on that directory to coalesce the hundreds files into two files, it coalesced into 2 new files and deleted those hundreds of files. After replication jobs runs, we noticed that those 2 new files are replicated, but those hundreds of files have not been deleted on target datacenter.
Any idea as to why those hundreds of files have been deleted on source directory are not removed on Target directory with replication job.
Note: I have enabled the deleted policy (delete to trash) on Replication schedule job.