I am trying to export large hbase table into sequence files. Our requirement is to split the sequence files with specific size limit. We have tried multiple options but output part files are getting generated with large size like 6-7 GB. We used the below sample export command which trying to generate sequence file in a range of 100mb - 135mb, but didn't work as expected.
What is the CDH version？Could you attach the full output of this command, from the command help menu there's no properties about split size：
# hbase org.apache.hadoop.hbase.mapreduce.Export ERROR: Wrong number of arguments: 0 Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]
Note: -D properties will be applied to the conf used. For example: -D mapreduce.output.fileoutputformat.compress=true -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec -D mapreduce.output.fileoutputformat.compress.type=BLOCK Additionally, the following SCAN properties can be specified to control/limit what is exported.. -D hbase.mapreduce.scan.column.family=<family1>,<family2>, ... -D hbase.mapreduce.include.deleted.rows=true -D hbase.mapreduce.scan.row.start=<ROWSTART> -D hbase.mapreduce.scan.row.stop=<ROWSTOP> -D hbase.client.scanner.caching=100 -D hbase.export.visibility.labels=<labels> For tables with very wide rows consider setting the batch size as below: -D hbase.export.scanner.batch=10 -D hbase.export.scanner.caching=100 -D mapreduce.job.name=jobName - use the specified mapreduce job name for the export For MR performance consider the following properties: -D mapreduce.map.speculative=false -D mapreduce.reduce.speculative=false