Support Questions

Find answers, ask questions, and share your expertise

Hbase export - split the sequence files with specific size limit

avatar
New Contributor

Hi,

 

I am trying to export large hbase table into sequence files. Our requirement is to split the sequence files with specific size limit. We have tried multiple options but output part files are getting generated with large size like 6-7 GB. We used the below sample export command which trying to generate sequence file in a range of 100mb - 135mb,  but didn't work as expected.

 

 

 

hbase org.apache.hadoop.hbase.mapreduce.Export -D dfs.blocksize=1048576 -D mapred.min.split.size=102400 -D mapred.max.split.size=1048576 \table_name /output/hbase/

 

 

 

Could you please help me here to export the table into small chunks ?

 

 

2 REPLIES 2

avatar
Master Collaborator

Hello @arunr307 ,

What is the CDH version?Could you attach the full output of this command, from the command help menu there's no properties about split size:

# hbase org.apache.hadoop.hbase.mapreduce.Export
ERROR: Wrong number of arguments: 0
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

Note: -D properties will be applied to the conf used.
For example:
-D mapreduce.output.fileoutputformat.compress=true
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
-D mapreduce.output.fileoutputformat.compress.type=BLOCK
Additionally, the following SCAN properties can be specified
to control/limit what is exported..
-D hbase.mapreduce.scan.column.family=<family1>,<family2>, ...
-D hbase.mapreduce.include.deleted.rows=true
-D hbase.mapreduce.scan.row.start=<ROWSTART>
-D hbase.mapreduce.scan.row.stop=<ROWSTOP>
-D hbase.client.scanner.caching=100
-D hbase.export.visibility.labels=<labels>
For tables with very wide rows consider setting the batch size as below:
-D hbase.export.scanner.batch=10
-D hbase.export.scanner.caching=100
-D mapreduce.job.name=jobName - use the specified mapreduce job name for the export
For MR performance consider the following properties:
-D mapreduce.map.speculative=false
-D mapreduce.reduce.speculative=false

 

Thanks,

Will

avatar
Community Manager

@arunr307, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: