Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

specifying the number of reducers for Phoenix bulk load tool

avatar

IS there a way to specify the number of reducers for the phoenix CSV bulk load utility that uses the Mapreduce method to load data into hbase.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

No. During the start of the MR job you may see message like:

mapreduce.MultiHfileOutputFormat: Configuring 20 reduce partitions to match current region count

That's exactly the number of reducers that will be created. How many of them will be running in parallel depends on the MR engine configuration.

View solution in original post

3 REPLIES 3

avatar
Super Collaborator

MR job creates 1 reducer per region. So if you loading data to an empty table you may presplit table from HBase shell or use salting during table creation.

avatar

hi @ssoldatov, Thanks for your reply. So it wont honour the command line argument if i prvide like this

hadoop jar phoenix-4.8.1-HBase-1.1-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dmapreduce.job.reduces=4 (not the full command)

though i specified 4 reducers, it just considered only.

avatar
Super Collaborator

No. During the start of the MR job you may see message like:

mapreduce.MultiHfileOutputFormat: Configuring 20 reduce partitions to match current region count

That's exactly the number of reducers that will be created. How many of them will be running in parallel depends on the MR engine configuration.