Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

MapReduce performance on the HBase input table.

avatar
Explorer

While running a MR on the HDFS file, The # of mappers is based on the input split size. While it is usually true that the input split size will match the block size with a text input file in hdfs. ( understand not always the case but most of the time)

What will happen when I run a MR on the hbase table as a input record. How does the input split size match with the hbase table and how can I control the number of mapper while running MR on top of the HBase table. Can someone guide me here. Using hbase 0.98.4 and hdoop 2.6 version.

1 ACCEPTED SOLUTION

avatar
Guru

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

View solution in original post

1 REPLY 1

avatar
Guru

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.