Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Explorer

Does HBase replication guarantee consistency or snapshot-in-time?

HBase replication works by propagating the WAL from the source cluster to the destination and is replayed asynchronously. Thus, it is eventually consistent.

For snapshot functionality, consider taking a Snapshot on the source cluster and exporting that to the destination using the ExportSnapshot tool.

Is there an option of enabling replication at the GLOBAL level rather than individually at the table level?

HBase replication can be enabled at the table level or column family level only

Does enabling replication cause the table to be disabled?

Check that the hbase.online.schema.update.enable in hbase-site.xmlis true. If so, then the table will not be disabled when turning on table replication.

The default value is true so if the property is not set in hbase-site.xml, it is 'true'.

Does HBase replication solve the problem of replicating a cluster which already has data to an empty cluster?

Replication alone does not solve replication of pre-existing data. You will need to also use the copyTable command.

Here are steps to do that:

1. Begin the replication and take note of the current timestamp: T

2. Execute copyTable and set the ending timestamp to T from step 1

HBase replication does not work correctly in case of out-of-order mutations, is this statement true?

Please review the following from the Apache HBase book, it explains how there is not a guaranteed order of delivery for client edits - https://hbase.apache.org/book.html#_cluster_replication

How your application builds on top of the HBase API matters when replication is in play. HBase’s replication system provides at-least-once delivery of client edits for an enabled column family to each configured destination cluster. In the event of failure to reach a given destination, the replication system will retry sending edits in a way that might repeat a given message. Further more, there is not a guaranteed order of delivery for client edits. In the event of a RegionServer failing, recovery of the replication queue happens independent of recovery of the individual regions that server was previously handling. This means that it is possible for the not-yet-replicated edits to be serviced by a RegionServer that is currently slower to replicate than the one that handles edits from after the failure.

The combination of these two properties (at-least-once delivery and the lack of message ordering) means that some destination clusters may end up in a different state if your application makes use of operations that are not idempotent, e.g. Increments.

2,406 Views