Support Questions
Find answers, ask questions, and share your expertise

Replicating hdfs data to another data center

Replicating hdfs data to another data center


I have scheduled a replication via Cloudera Manager to replicate one of the hdfs data directory from one datacenter to another datacenter. It is working as expected, but we noticed one thing different.  When one of the user run the spark coalesce command on that directory to coalesce the hundreds files  into two files, it coalesced into 2 new files and deleted those hundreds of files.  After replication jobs runs, we noticed that those 2 new files are replicated, but those hundreds of files have not been deleted on target datacenter. 


Any idea as to why those hundreds of files have been deleted on source directory are not removed on Target directory with replication job.    


Note:  I have enabled the deleted policy (delete to trash) on Replication schedule job.  


x_trans_day ="/data/mart/cp/elixir/rx_trans/TRANSACTION_DATE_ID=20190102")
your help is very much appreciated.
Don't have an account?