Support Questions

Find answers, ask questions, and share your expertise

MapReduce performance on the HBase input table.

avatar
Explorer

While running a MR on the HDFS file, The # of mappers is based on the input split size. While it is usually true that the input split size will match the block size with a text input file in hdfs. ( understand not always the case but most of the time)

What will happen when I run a MR on the hbase table as a input record. How does the input split size match with the hbase table and how can I control the number of mapper while running MR on top of the HBase table. Can someone guide me here. Using hbase 0.98.4 and hdoop 2.6 version.

1 ACCEPTED SOLUTION

avatar
Guru

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

View solution in original post

1 REPLY 1

avatar
Guru

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.