Created on 04-19-2016 03:29 PM - edited 09-16-2022 03:14 AM
Hi dear experts!
i'm trying to load data with ImportTSV tool , like this:
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dmapreduce.job.reduces=1000 -Dimporttsv.columns="data:SS_SOLD_DATE_SK, HBASE_ROW_KEY" -Dimporttsv.separator="|" -Dimporttsv.bulk.output=/tmp/store_sales_hbase store_sales /user/root/benchmarks/bigbench/data/store_sales/*,
but have only one reducer (despite on -Dmapreduce.job.reduces=1000 setting).
i even set mapreduce.job.reduces=1000 on the Cluster wide, but still have only one reducer.
Could anybody hint how to resolve this?
thank you in advance for any input!
Created on 04-21-2016 12:01 AM - edited 04-21-2016 12:07 AM
As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )
Example: When the table is pre-splitted with 6 regions:
hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1
... ....
Job Counters
Launched map tasks=1
Launched reduce tasks=6 <<<
Created 04-20-2016 02:07 AM
Created on 04-21-2016 12:01 AM - edited 04-21-2016 12:07 AM
As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions )
Example: When the table is pre-splitted with 6 regions:
hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1
... ....
Job Counters
Launched map tasks=1
Launched reduce tasks=6 <<<