Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Data Migration between Hbase across major versions (1.x to 2.x) across data centers

avatar
New Contributor

Hi,


We are using a new Datacenter and have to move the data from old cluster to new cluster.

Cluster
Old
New
HDFS
2.7.1.2.3
3.1.1.3.1
Hbase
1.1.2.2.3.2.0
2.0.0.3.1
Zookeeper
3.4.6.2.3
3.4.9.3.1
OS
Debian 7 (Wheezy)
Debian 9 (Stretch)


Since the major versions are different, I only know of one option for data migration: CopyTable. (+ SyncTable)

Are there any challenges to keep note of, since we have different OS.


Is there a way to online replicate the data?

We have around 40 tables and around 1TB of data for each. How many copy table operations can be run parallel?


Thank you

5 REPLIES 5

avatar
New Contributor

Wonderful question, too bad noone answered this. We are handling something similar (upgrading to HDP3.1.5 from HDP2.6.5 in diffrent cluster - HBase 1.1.2 to Hbase 2.0.2). We asked this through Cloudera support and offered us same solution  - copyTable / syncTable but this can't be run from destination cluster (since the source cluster is in current production) so we're looking at snapshots solution, but we still need to identify if is feasible and the challenges implied...

Any inputs from your experience? did you managed to do this migration/upgrade?

Thank You!

avatar
Contributor

@ValiD_M Did you find any solution for this?

avatar
Explorer

Did anyone found a solution?

avatar
Community Manager

@Ivoz As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks!


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Super Collaborator

Hello All,

 

This is an older post which had a few recent followup queries. To close the loop, HBase offers multiple Tools to migrate Data from 1 Cluster to another Cluster like Snapshot, Export-Import, HashTable/SyncTable etc. Most of these Tools relies on MapReduce & uses 1 Mapper per Region of the Source Table. All these Tools works without any concerns. The only part of the ask which can't be answered accurately is the Concurrency/Job Configurations/Mapper Memory etc. These details rely on Customer's Environment Setup & the Bandwidth between the 2 Clusters. As such, Customer can run 1 such HBase MR Job & see the Outcome. Accordingly, Fine-Tune is required. 

 

If any issues are observed while performing the above HBase MR Job, Feel free to post the Q in a Community Post for fellow Community Members to review & share their thoughts. 

 

Regards, Smarak