Support Questions

arunpoy · ‎11-10-2016

IS there a way to specify the number of reducers for the phoenix CSV bulk load utility that uses the Mapreduce method to load data into hbase.

ssoldatov · ‎11-10-2016

No. During the start of the MR job you may see message like:

mapreduce.MultiHfileOutputFormat: Configuring 20 reduce partitions to match current region count

That's exactly the number of reducers that will be created. How many of them will be running in parallel depends on the MR engine configuration.

View solution in original post

ssoldatov · ‎11-10-2016

MR job creates 1 reducer per region. So if you loading data to an empty table you may presplit table from HBase shell or use salting during table creation.

arunpoy · ‎11-10-2016

hi @ssoldatov, Thanks for your reply. So it wont honour the command line argument if i prvide like this

hadoop jar phoenix-4.8.1-HBase-1.1-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dmapreduce.job.reduces=4 (not the full command)

though i specified 4 reducers, it just considered only.

ssoldatov · ‎11-10-2016

No. During the start of the MR job you may see message like:

mapreduce.MultiHfileOutputFormat: Configuring 20 reduce partitions to match current region count

That's exactly the number of reducers that will be created. How many of them will be running in parallel depends on the MR engine configuration.

Cloudera Community

Support Questions

specifying the number of reducers for Phoenix bulk load tool

Apache Phoenix Performance Testing Tools

Suggestions for Bulk Loading Large Files into HBas...

Performance metrics phoenix bulk load vs hbase bul...

HBase increase num of reducers for bulk loading wi...

Datagen - Data Generator tool built for CDP

Reducing Cloud Spend: Cost Strategies for Cloudera...

Does phoenix update global index during bulk load?

Reducer is running very slow in hbase bulk load

Hive on Tez Performance Tuning - Determining Reduc...

Phoenix Bulk Load on Ctrl-A delimiter (error code ...