- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Copy one HBase table to another in a short duration
- Labels:
-
Apache HBase
Created 09-20-2016 09:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wanted to copy data from one HBase table to another, but it's taking Huuuuge time to do the same and meanwhile other jobs in cluster are failing due to same. I tried below 3 techniques and still time taken is on a higher side. Can you suggest some efficient approach for the same. 1) org.apache.hadoop.hbase.mapreduce.CopyTable 2)org.apache.hadoop.hbase.mapreduce.export 3)take snapshot of the table, then clone the snapshot into another table Thanks in advance..
Created 09-20-2016 09:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Dheeraj,
Hbase snapshot is the best method for disaster, backup and recovery procedure,
snapshot 'sourceTable', 'sourceTable-snapshot'
clone_snapshot 'sourceTable-snapshot', 'newTable'
Created 09-20-2016 09:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Nitin, But I had already tried snapshot and clone and it's taking huuge time
Created 09-20-2016 10:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is simplest and best method for this. But you can HTable API for backup as well,
HTable API (such as a custom Java application)
As is always the case with Hadoop, you can always write your own custom application that utilizes the publicAPI and queries the table directly. You can do this through MapReduce jobs in order to utilize that framework’s distributed batch processing advantages, or through any other means of your own design. However, this approach requires a deep understanding of Hadoop development and all the APIs and performance implications of using them in your production cluster.
Created 09-20-2016 10:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you copying data on same cluster or different cluster?
Created 09-20-2016 11:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Same cluster
Created 09-20-2016 11:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Then snapshot the best among them.
Created 09-20-2016 11:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am sorry Nitin, don't have much understanding for HTableAPI, if you could suggest some different command please.
or is it possible to copy few data(say few columns ir rows) using snapshot so that I can copy in 4-5 iterations.
Created 09-20-2016 12:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How can we run in more than 1 iterations using snapshot pls
Created 09-20-2016 12:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Dheeraj,
We cant run snapshot in multiple iterations but we can use copytable and copy data from one timestamp to other timestamp like,
http://hbase.apache.org/0.94/book/ops_mgt.html#copytable
CopyTable is a utility that can copy part or of all of a table, either to the same cluster or another cluster. The usage is as follows:
- $ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable[--starttime=X][--endtime=Y][--new.name=NEW][--peer.adr=ADR] tablename
