Support Questions
Find answers, ask questions, and share your expertise

Data loss in at DR cluster while doing HBASE Replication from Source cluster if table size is 2TB for smaller table it is working fine.

New Contributor

Hi,

 

We could see the data loss in DR cluster when replicating data from HBASE (prod cluster).

 

Source:  Prod_Cluster. Hbase replication is configured as true

Dest: DR_Cluster

 

Data is getting loaded successfully in source but while replicating in DR cluster we could see some records are missing.

 

Can some one let us know what wil be the potential cause for this?

1 REPLY 1

Super Collaborator

Hello @Hadoop_Admin 

 

Thanks for using Cloudera Community. To reiterate, your Team enabled Replication from ClusterA to ClusterB & seeing Data Loss. By Data Loss, your Team means the Record Count on Source & Target isn't matching. This is observed for Large Table with ~2TB Size.

 

Kindly confirm the Process being used for Customer to compare the Record Count. Is VerifyRep being utilised for the concerned purpose.

 

Next, HBase Replication is supposed to be Asynchronous i.e. some Lags are expected, if the Source Table is being loaded. Confirm if the Command [status 'replication'] is reporting any Replication Lag.

 

Next, We need to establish if the RowCount Difference is Static or Dynamic during a period of No-Load on Source Table (If feasible). If Source Table has 100 Rows & Target Table has 90 Rows & remains so, We can assume 10 Rows are the Difference. If Target Table shows 91>92>93... Rows, We can assume Replication is catching up.

 

Finally, Any Audit Record showing any Delete Ops on the Target Table. 

 

- Smarak

; ;