Member since
11-19-2020
3
Posts
1
Kudos Received
0
Solutions
01-19-2021
02:01 AM
Hello @ashinde I mean if we delete data on the source CDP for whatever reason (purge, archiving, dataset rebuild) how to capture those events and replicate the delete action on the target CDP. it seems that all of my proposal will only able to add data on the target.
... View more
01-12-2021
08:20 AM
1 Kudo
I have several ideas in mind : 1) Nifi process at the file system level : capture data with ListHDFS + FetchHDFS --> ingest data with PuHDFS But what about file delete ? I was thinking to use GetHDFSEvents to capture "unlink" events and replicate events with DeleteHDFS But it seems that GetHDFSEvents is not compatible with ADLS Gen2 storage 2) Distcp Again seems working with new data and updated file but I don't understand how it can handle deletes (except if we drop the target data before a full copy) 3) AzCopy Only compatible with ADLS (but I imagine that a similar tool is available for S3 buckets) with "azcopy sync" option https://docs.microsoft.com/fr-fr/azure/storage/common/storage-ref-azcopy-sync 4) Nifi process at Hive level Not sure if it's very elegant : capture data with SelectHive3QL (Avro output) Ingest data with PutHive3Streaming But not sure how to manage deletes Any best practice or other better idea ?
... View more
01-11-2021
10:53 AM
Hello, I am looking for the best solution to replicate data between CDP Public Cloud instances I have found proposal with Nifi : https://community.cloudera.com/t5/Support-Questions/Looking-for-a-replacement-for-Hadoop-to-Hadoop-copy-distcp/m-p/140571/highlight/true#M103178 or using Distcp. I am not sure that both solution are handling properly "data synchronization" (at least the Nifi solution seems not able to handle file delete not sure for Distcp) What is the best way to proceed ? Do you knwo if Cloudera Replication Manager will soon support CDP to CDP scenario in the cloud ?
... View more
Labels: