Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase increase num of reducers for bulk loading with ImportTSV

Solved Go to solution

HBase increase num of reducers for bulk loading with ImportTSV

Rising Star

Hi dear experts!

 

i'm trying to load data with ImportTSV tool , like this:

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dmapreduce.job.reduces=1000 -Dimporttsv.columns="data:SS_SOLD_DATE_SK, HBASE_ROW_KEY" -Dimporttsv.separator="|" -Dimporttsv.bulk.output=/tmp/store_sales_hbase store_sales /user/root/benchmarks/bigbench/data/store_sales/*,

but have only one reducer (despite on -Dmapreduce.job.reduces=1000 setting).

i even set mapreduce.job.reduces=1000 on the Cluster wide, but still have only one reducer.

 

Could anybody hint how to resolve this?

thank you in advance for any input!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: HBase increase num of reducers for bulk loading with ImportTSV

Expert Contributor

As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )

 

Example: When the table is pre-splitted with 6 regions:

hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1

... ....

Job Counters

Launched map tasks=1

Launched reduce tasks=6 <<<

View solution in original post

2 REPLIES 2
Highlighted

Re: HBase increase num of reducers for bulk loading with ImportTSV

Master Guru
The reduce phase of a bulk load preparation job is used to align the output
files against the # of regions under the targeted table. You will always
see the number of reducers equal the number of regions in the targeted
table during the time of launching the job.

If you desire more reducers, you will need to pre-split your table
appropriately. Read up more on pre-splitting at
http://hbase.apache.org/book.html#manual_region_splitting_decisions
Highlighted

Re: HBase increase num of reducers for bulk loading with ImportTSV

Expert Contributor

As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )

 

Example: When the table is pre-splitted with 6 regions:

hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1

... ....

Job Counters

Launched map tasks=1

Launched reduce tasks=6 <<<

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here