Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Contributor

Introduction

HBase replication provides with a way of replicating HBase data from one cluster to another by adding the remote Zookeeper quorum as remote peer.

Configuration on the cluster

First of all, it is necessary to set thehbase.replication property to true. Then add the remote peer through hbase shell. The peer id can be any short name:

For example:

add_peer '1', "hdpdstzk01.machine.domain, hdpdstzk02.machine.domain, hdpdstzk03.machine.domain:2181:/hbase-secure"

(If using Kerberos then the right JAAS configuration needs to be used, or it would be required to have the hbase service keytab in the cache to authenticate correctly against Zookeeper through SASL).

Configuration on the tables

Replication is set at table and column family level by setting the propertyREPLICATION_SCOPE to ‘1’. The default value that tables get created with if not specified is ‘0’, which means no replication. If applying on already existing tables, then they need to be disabled, then the property added through alter, and then re-enabled back.

For example: alter "product:user", {NAME => 'document', REPLICATION_SCOPE => '1'}

Copying existing data across

If there is already data on the source table, it can be replicated initially through the CopyTable command:

bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable –peer.adr=hdpdstzk01.machine.domain, hdpdstzk02.machine.domain, hdpdstzk03.machine.domain:2181:/hbase-secure mytable [--new.name=mytableCopy] [--starttime=abc --endtime=xyz]

new.name is only used when the destination table name is different from the source one

starttime and endtime can be used when we want to replicate a specific interval of HBase timestamps

5,226 Views