Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase Replication between two clusters for different major CDH

avatar
Rising Star

Hello,

I try to replicate Hbase data from our exsiting production cluster (CDH4.5) to a new cluster (CDH5.3) on AWS . I checked out Cloudera Documentation and it required the two clusters must have the same major version of CDHs. That is a blooker for us.

Is there a way to have replication setup between 2 clusters even those they have different version of CDHs?

Do you have any recommendation for our case?

Thank you very much,

Thai

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hello Thai,

 

Going from CDH 4.5 (hbase .94.x) to CDH 5.3 (hbase .98.x) is actually 2 major version jumps. The big one is that .96 introduced "the singularity" which in short means hbase is not wire compatable between the two versions. [1]

 

If you absolutely can't upgrade both clusters to the same version at the same time, then you will have to disable replication and create your own way of replicating the data.  

 

I know of two methods to move the data between clusters that would still work:

1: The rest client (slow, so may not be able to keep up depending on your use case)

2: Export to HDFS -> Distcp -> Import to target cluster  (batch, so there will be a large lag in syncronization)

 

 

[1]http://hbase.apache.org/book.html#upgrade0.96

 

Hope this helps!

-Ben

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

Hello Thai,

 

Going from CDH 4.5 (hbase .94.x) to CDH 5.3 (hbase .98.x) is actually 2 major version jumps. The big one is that .96 introduced "the singularity" which in short means hbase is not wire compatable between the two versions. [1]

 

If you absolutely can't upgrade both clusters to the same version at the same time, then you will have to disable replication and create your own way of replicating the data.  

 

I know of two methods to move the data between clusters that would still work:

1: The rest client (slow, so may not be able to keep up depending on your use case)

2: Export to HDFS -> Distcp -> Import to target cluster  (batch, so there will be a large lag in syncronization)

 

 

[1]http://hbase.apache.org/book.html#upgrade0.96

 

Hope this helps!

-Ben

avatar
Rising Star

I did try the 2nd method and it it work.

Thank you very much for your help.

avatar
New Contributor

Hi,

 

  I have one requirement regrading hbase replication.

I have two cluster..1 PROD and 1 DR..both having same IP and server name but on diffrent network.

And customer wants to assign diffrent IPs for the replication between PROD and DR,

So for example at PROD which is my source have one IP which will communiocate with my DR cluster which will have another IP.

So PROD and DR will be communicate with new IPs.

So in this senario what will be the configuration we need to put and which filles at Source side and target side.

Please suggest.