Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hbase CopyTable Command Options

avatar
New Contributor

I want to copy a table from HBase to HBase across clusters by using copytable command, by default it is set to 1 mapper and scans all rows which cause a timeout. Are there any options available for the HBase copytable command in a way to optimize performance? without specifying any parameter to HBase-site.xml.

 


hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=myserver:/hbase --new.name=<<tablename>>  <<tablename>> 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @rootuser,

 

Thanks for using Cloudera Community. Based on the Post, You are trying to use CopyTable to copy HBase Table(s) from 1 Cluster to another Cluster, wherein 1 Mapper is being observed. 

 

Please confirm if the Source Table has 1 Region only. Additionally, Confirm if CopyTable on a Table with >1 Regions (Say, 5 Regions) creates 1 Mapper or 5 Mappers. Also, Please state the HBase Version being used by your Team. Additionally, Share the Timeout being observed by your Team. 

 

As far as I recall, HBase uses 1 Mapper per Region. As such, It's likely the Source Table has 1 Region only. In such case, Increasing the Region Split by Pre-Split or Increasing the Timeout should help. 

 

Regards, Smarak

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Hello @rootuser,

 

Thanks for using Cloudera Community. Based on the Post, You are trying to use CopyTable to copy HBase Table(s) from 1 Cluster to another Cluster, wherein 1 Mapper is being observed. 

 

Please confirm if the Source Table has 1 Region only. Additionally, Confirm if CopyTable on a Table with >1 Regions (Say, 5 Regions) creates 1 Mapper or 5 Mappers. Also, Please state the HBase Version being used by your Team. Additionally, Share the Timeout being observed by your Team. 

 

As far as I recall, HBase uses 1 Mapper per Region. As such, It's likely the Source Table has 1 Region only. In such case, Increasing the Region Split by Pre-Split or Increasing the Timeout should help. 

 

Regards, Smarak