Reply
Expert Contributor
Posts: 87
Registered: ‎09-17-2014
Accepted Solution

HBase increase num of reducers for bulk loading with ImportTSV

Hi dear experts!

 

i'm trying to load data with ImportTSV tool , like this:

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dmapreduce.job.reduces=1000 -Dimporttsv.columns="data:SS_SOLD_DATE_SK, HBASE_ROW_KEY" -Dimporttsv.separator="|" -Dimporttsv.bulk.output=/tmp/store_sales_hbase store_sales /user/root/benchmarks/bigbench/data/store_sales/*,

but have only one reducer (despite on -Dmapreduce.job.reduces=1000 setting).

i even set mapreduce.job.reduces=1000 on the Cluster wide, but still have only one reducer.

 

Could anybody hint how to resolve this?

thank you in advance for any input!

Posts: 1,825
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: HBase increase num of reducers for bulk loading with ImportTSV

The reduce phase of a bulk load preparation job is used to align the output
files against the # of regions under the targeted table. You will always
see the number of reducers equal the number of regions in the targeted
table during the time of launching the job.

If you desire more reducers, you will need to pre-split your table
appropriately. Read up more on pre-splitting at
http://hbase.apache.org/book.html#manual_region_splitting_decisions
Highlighted
Cloudera Employee
Posts: 72
Registered: ‎11-16-2015

Re: HBase increase num of reducers for bulk loading with ImportTSV

[ Edited ]

As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )

 

Example: When the table is pre-splitted with 6 regions:

hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1

... ....

Job Counters

Launched map tasks=1

Launched reduce tasks=6 <<<

Announcements