question Re: Hbase export - split the sequence files with specific size limit in Support Questions

Hbase export - split the sequence files with specific size limit

arunr307 — Mon, 11 Apr 2022 20:24:42 GMT

Hi,

I am trying to export large hbase table into sequence files. Our requirement is to split the sequence files with specific size limit. We have tried multiple options but output part files are getting generated with large size like 6-7 GB. We used the below sample export command which trying to generate sequence file in a range of 100mb - 135mb, but didn't work as expected.

hbase org.apache.hadoop.hbase.mapreduce.Export -D dfs.blocksize=1048576 -D mapred.min.split.size=102400 -D mapred.max.split.size=1048576 \table_name /output/hbase/

Could you please help me here to export the table into small chunks ?

Re: Hbase export - split the sequence files with specific size limit

willx — Fri, 22 Apr 2022 09:44:13 GMT

Hello @arunr307 ,

What is the CDH version？Could you attach the full output of this command, from the command help menu there's no properties about split size：

# hbase org.apache.hadoop.hbase.mapreduce.Export
ERROR: Wrong number of arguments: 0
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

Note: -D properties will be applied to the conf used.
For example:
-D mapreduce.output.fileoutputformat.compress=true
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
-D mapreduce.output.fileoutputformat.compress.type=BLOCK
Additionally, the following SCAN properties can be specified
to control/limit what is exported..
-D hbase.mapreduce.scan.column.family=<family1>,<family2>, ...
-D hbase.mapreduce.include.deleted.rows=true
-D hbase.mapreduce.scan.row.start=<ROWSTART>
-D hbase.mapreduce.scan.row.stop=<ROWSTOP>
-D hbase.client.scanner.caching=100
-D hbase.export.visibility.labels=<labels>
For tables with very wide rows consider setting the batch size as below:
-D hbase.export.scanner.batch=10
-D hbase.export.scanner.caching=100
-D mapreduce.job.name=jobName - use the specified mapreduce job name for the export
For MR performance consider the following properties:
-D mapreduce.map.speculative=false
-D mapreduce.reduce.speculative=false

Thanks,

Will

Re: Hbase export - split the sequence files with specific size limit

VidyaSargur — Thu, 28 Apr 2022 08:24:49 GMT

@arunr307, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.