- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HBase increase num of reducers for bulk loading with ImportTSV
- Labels:
-
Apache HBase
Created on ‎04-19-2016 03:29 PM - edited ‎09-16-2022 03:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi dear experts!
i'm trying to load data with ImportTSV tool , like this:
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dmapreduce.job.reduces=1000 -Dimporttsv.columns="data:SS_SOLD_DATE_SK, HBASE_ROW_KEY" -Dimporttsv.separator="|" -Dimporttsv.bulk.output=/tmp/store_sales_hbase store_sales /user/root/benchmarks/bigbench/data/store_sales/*,
but have only one reducer (despite on -Dmapreduce.job.reduces=1000 setting).
i even set mapreduce.job.reduces=1000 on the Cluster wide, but still have only one reducer.
Could anybody hint how to resolve this?
thank you in advance for any input!
Created on ‎04-21-2016 12:01 AM - edited ‎04-21-2016 12:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )
Example: When the table is pre-splitted with 6 regions:
hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1
... ....
Job Counters
Launched map tasks=1
Launched reduce tasks=6 <<<
Created ‎04-20-2016 02:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
files against the # of regions under the targeted table. You will always
see the number of reducers equal the number of regions in the targeted
table during the time of launching the job.
If you desire more reducers, you will need to pre-split your table
appropriately. Read up more on pre-splitting at
http://hbase.apache.org/book.html#manual_region_splitting_decisions
Created on ‎04-21-2016 12:01 AM - edited ‎04-21-2016 12:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )
Example: When the table is pre-splitted with 6 regions:
hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1
... ....
Job Counters
Launched map tasks=1
Launched reduce tasks=6 <<<
