Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Help setting up HBase Bulkload Replication

Highlighted

Help setting up HBase Bulkload Replication

Contributor

I am currently in the process of setting up replication on HBase. Normal HBase replication (based on the WALs) is set up and working well. However I am struggling to set up replication for bulk loading as specified in [HBASE-13153]

I am running HDP 2.6. I initially thought the version of HBase was the issue, but according to release notes this was patched in in HDP 2.4.3 so HBase version is no the issue here. I have followed the instructions provided in the link as well as I can understand them, but bulk loaded rows are not being replicated.

I have set the following configurations:

  • On the source cluster: hbase.replication.bulkload.enabled=true
  • On the source cluster: hbase.replication.cluster.id=source
  • On the peer cluster: hbase.replication.conf.dir=/tmp/fs_conf
  • On the peer cluster: hbase.replication.source.fs.conf.provider=org.apache.hadoop.hbase.replication.regionserver.DefaultSourceFSConfigurationProvider
  • I have copied the core-site.xml, hdfs-site.xml, yarn-site.xml, hbase-site.xml from the source cluster to the peer cluster, to all of the region servers under /tmp/fs_conf/source

These are the only changes specified in the JIRA (at least as far as I understand it) however the rows that are bulk loaded are not being replicated. I suspect I have missed/misinterpreted part of the instructions.

Any help will be appreciated.

3 REPLIES 3

Re: Help setting up HBase Bulkload Replication

Super Collaborator

Is hbase.replication.conf.dir defined on every node in the peer cluster ?

Specifying /tmp is not good idea since the files under /tmp would be periodically cleaned.

Re: Help setting up HBase Bulkload Replication

Contributor

I set that configuration through Ambari, which should propagate the config to all nodes in the cluster if I understand correctly. Should I perhaps include a trailing / in the config?

I am only using /tmp for testing purposes in replicating from our dev cluster. Once this goes to our production and DR clusters, I will choose a more suitable location for the configuration files.

Re: Help setting up HBase Bulkload Replication

New Contributor

Is this Issue resolved?

Don't have an account?
Coming from Hortonworks? Activate your account here