Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

MapReduce performance on the HBase input table.

avatar
New Member

While running a MR on the HDFS file, The # of mappers is based on the input split size. While it is usually true that the input split size will match the block size with a text input file in hdfs. ( understand not always the case but most of the time)

What will happen when I run a MR on the hbase table as a input record. How does the input split size match with the hbase table and how can I control the number of mapper while running MR on top of the HBase table. Can someone guide me here. Using hbase 0.98.4 and hdoop 2.6 version.

1 ACCEPTED SOLUTION

avatar
Guru

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

View solution in original post

1 REPLY 1

avatar
Guru

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.