Support Questions

Find answers, ask questions, and share your expertise

HBase Region Replication and Bulk Load Strategy?

As https://issues.apache.org/jira/browse/HBASE-13153 shows there is a mismatch between using region replication and using bulk loads that prevent the data from the bulk loads from being replicated (if I understand everything correctly). That said, that JIRA suggests help is on the way, but is there a strategy that should be employed in the meantime?

https://community.hortonworks.com/content/kbentry/90957/hbase-replication-faq.html suggests to me that we should copy the table or something similar. Maybe we should disable the replication and re-enabled it? Just looking for high-level strategy to deal with these two features that don't seem to play nice together.

2 REPLIES 2

Minor correction: region replication and [table] replication are two separate things. Be sure not to confuse the two 🙂

In the cases where HBase replication does not support the replication of bulk-loaded HFiles, the simple solution is to distcp the files you are going to bulk load to the destination cluster and also bulk import them there.

Fair enough on cluster-to-cluster replication. I'm thinking about intra-cluster replication of regions for the purpose of HA Read features. Is this automatically addressed already and if not, are there strategies to take care of this?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.