Support Questions

Find answers, ask questions, and share your expertise

HBase increase num of reducers for bulk loading with ImportTSV

avatar
Rising Star

Hi dear experts!

 

i'm trying to load data with ImportTSV tool , like this:

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dmapreduce.job.reduces=1000 -Dimporttsv.columns="data:SS_SOLD_DATE_SK, HBASE_ROW_KEY" -Dimporttsv.separator="|" -Dimporttsv.bulk.output=/tmp/store_sales_hbase store_sales /user/root/benchmarks/bigbench/data/store_sales/*,

but have only one reducer (despite on -Dmapreduce.job.reduces=1000 setting).

i even set mapreduce.job.reduces=1000 on the Cluster wide, but still have only one reducer.

 

Could anybody hint how to resolve this?

thank you in advance for any input!

1 ACCEPTED SOLUTION

avatar
Master Collaborator

As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )

 

Example: When the table is pre-splitted with 6 regions:

hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1

... ....

Job Counters

Launched map tasks=1

Launched reduce tasks=6 <<<

View solution in original post

2 REPLIES 2

avatar
Mentor
The reduce phase of a bulk load preparation job is used to align the output
files against the # of regions under the targeted table. You will always
see the number of reducers equal the number of regions in the targeted
table during the time of launching the job.

If you desire more reducers, you will need to pre-split your table
appropriately. Read up more on pre-splitting at
http://hbase.apache.org/book.html#manual_region_splitting_decisions

avatar
Master Collaborator

As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )

 

Example: When the table is pre-splitted with 6 regions:

hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1

... ....

Job Counters

Launched map tasks=1

Launched reduce tasks=6 <<<