question Re: HBase increase num of reducers for bulk loading with ImportTSV in Archives of Support Questions (Read Only)

HBase increase num of reducers for bulk loading with ImportTSV

fil — Fri, 16 Sep 2022 10:14:41 GMT

Hi dear experts!

i'm trying to load data with ImportTSV tool , like this:

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dmapreduce.job.reduces=1000 -Dimporttsv.columns="data:SS_SOLD_DATE_SK, HBASE_ROW_KEY" -Dimporttsv.separator="|" -Dimporttsv.bulk.output=/tmp/store_sales_hbase store_sales /user/root/benchmarks/bigbench/data/store_sales/*,

but have only one reducer (despite on -Dmapreduce.job.reduces=1000 setting).

i even set mapreduce.job.reduces=1000 on the Cluster wide, but still have only one reducer.

Could anybody hint how to resolve this?

thank you in advance for any input!

Re: HBase increase num of reducers for bulk loading with ImportTSV

Harsh J — Wed, 20 Apr 2016 09:07:09 GMT

The reduce phase of a bulk load preparation job is used to align the output
files against the # of regions under the targeted table. You will always
see the number of reducers equal the number of regions in the targeted
table during the time of launching the job.

If you desire more reducers, you will need to pre-split your table
appropriately. Read up more on pre-splitting at
http://hbase.apache.org/book.html#manual_region_splitting_decisions

Re: HBase increase num of reducers for bulk loading with ImportTSV

AutoIN — Thu, 21 Apr 2016 07:07:54 GMT

As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )

Example: When the table is pre-splitted with 6 regions:

hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1

... ....

Job Counters

Launched map tasks=1

Launched reduce tasks=6 <<<