Created 04-11-2022 01:24 PM
Hi,
I am trying to export large hbase table into sequence files. Our requirement is to split the sequence files with specific size limit. We have tried multiple options but output part files are getting generated with large size like 6-7 GB. We used the below sample export command which trying to generate sequence file in a range of 100mb - 135mb, but didn't work as expected.
hbase org.apache.hadoop.hbase.mapreduce.Export -D dfs.blocksize=1048576 -D mapred.min.split.size=102400 -D mapred.max.split.size=1048576 \table_name /output/hbase/
Could you please help me here to export the table into small chunks ?
Created 04-22-2022 02:44 AM
Hello @arunr307 ,
What is the CDH version?Could you attach the full output of this command, from the command help menu there's no properties about split size:
# hbase org.apache.hadoop.hbase.mapreduce.Export
ERROR: Wrong number of arguments: 0
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]
Note: -D properties will be applied to the conf used.
For example:
-D mapreduce.output.fileoutputformat.compress=true
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
-D mapreduce.output.fileoutputformat.compress.type=BLOCK
Additionally, the following SCAN properties can be specified
to control/limit what is exported..
-D hbase.mapreduce.scan.column.family=<family1>,<family2>, ...
-D hbase.mapreduce.include.deleted.rows=true
-D hbase.mapreduce.scan.row.start=<ROWSTART>
-D hbase.mapreduce.scan.row.stop=<ROWSTOP>
-D hbase.client.scanner.caching=100
-D hbase.export.visibility.labels=<labels>
For tables with very wide rows consider setting the batch size as below:
-D hbase.export.scanner.batch=10
-D hbase.export.scanner.caching=100
-D mapreduce.job.name=jobName - use the specified mapreduce job name for the export
For MR performance consider the following properties:
-D mapreduce.map.speculative=false
-D mapreduce.reduce.speculative=false
Thanks,
Will
Created 04-28-2022 01:24 AM
@arunr307, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,