Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Copying data from One HBase to another Hbase cluster

Solved Go to solution

Copying data from One HBase to another Hbase cluster

New Contributor

Hi, I would like to copy data from one Hbase cluster to another Hbase cluster. While copying I should copy 1000 records as first set and 1000 records as second set...and so on. For example, first set, I should copy only 1000 records, second set, I should copy 1000. How can I achieve using commands or scripts?

Note: my Hbase table is huge. So, I would like to split and copy the data from one cluster to another (without moving to HDFS).

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Copying data from One HBase to another Hbase cluster

I would strongly suggest you look at HBase's snapshotting model as detailed at https://hbase.apache.org/book.html#ops.snapshots. The snapshot create process is very fast as it does NOT create a copy of the underlying HFiles on HDFS (just keeps HDFS snapshot "pointers" to them). Then you can use the ExportSnapshot process that will copy the needed underlying HFiles over to the second HBase cluster. This model won't utilize any extra space on the source cluster (well, delete the snapshot once you are done!) or on the target cluster as you'll have to get all those HFiles created which is what this process does.

Good luck and happy HBasing!

1 REPLY 1
Highlighted

Re: Copying data from One HBase to another Hbase cluster

I would strongly suggest you look at HBase's snapshotting model as detailed at https://hbase.apache.org/book.html#ops.snapshots. The snapshot create process is very fast as it does NOT create a copy of the underlying HFiles on HDFS (just keeps HDFS snapshot "pointers" to them). Then you can use the ExportSnapshot process that will copy the needed underlying HFiles over to the second HBase cluster. This model won't utilize any extra space on the source cluster (well, delete the snapshot once you are done!) or on the target cluster as you'll have to get all those HFiles created which is what this process does.

Good luck and happy HBasing!

Don't have an account?
Coming from Hortonworks? Activate your account here