Reply
Highlighted
Contributor
Posts: 27
Registered: ‎02-24-2015
Accepted Solution

HBase Replication between two clusters for different major CDH

Hello,

I try to replicate Hbase data from our exsiting production cluster (CDH4.5) to a new cluster (CDH5.3) on AWS . I checked out Cloudera Documentation and it required the two clusters must have the same major version of CDHs. That is a blooker for us.

Is there a way to have replication setup between 2 clusters even those they have different version of CDHs?

Do you have any recommendation for our case?

Thank you very much,

Thai

Expert Contributor
Posts: 101
Registered: ‎01-24-2014

Re: HBase Replication between two clusters for different major CDH

Hello Thai,

 

Going from CDH 4.5 (hbase .94.x) to CDH 5.3 (hbase .98.x) is actually 2 major version jumps. The big one is that .96 introduced "the singularity" which in short means hbase is not wire compatable between the two versions. [1]

 

If you absolutely can't upgrade both clusters to the same version at the same time, then you will have to disable replication and create your own way of replicating the data.  

 

I know of two methods to move the data between clusters that would still work:

1: The rest client (slow, so may not be able to keep up depending on your use case)

2: Export to HDFS -> Distcp -> Import to target cluster  (batch, so there will be a large lag in syncronization)

 

 

[1]http://hbase.apache.org/book.html#upgrade0.96

 

Hope this helps!

-Ben

Contributor
Posts: 27
Registered: ‎02-24-2015

Re: HBase Replication between two clusters for different major CDH

I did try the 2nd method and it it work.

Thank you very much for your help.

New Contributor
Posts: 2
Registered: ‎07-01-2016

Re: HBase Replication between two clusters for different major CDH

Hi,

 

  I have one requirement regrading hbase replication.

I have two cluster..1 PROD and 1 DR..both having same IP and server name but on diffrent network.

And customer wants to assign diffrent IPs for the replication between PROD and DR,

So for example at PROD which is my source have one IP which will communiocate with my DR cluster which will have another IP.

So PROD and DR will be communicate with new IPs.

So in this senario what will be the configuration we need to put and which filles at Source side and target side.

Please suggest.