Support Questions

arunkumar_d · ‎11-03-2016

While running a MR on the HDFS file, The # of mappers is based on the input split size. While it is usually true that the input split size will match the block size with a text input file in hdfs. ( understand not always the case but most of the time)

What will happen when I run a MR on the hbase table as a input record. How does the input split size match with the hbase table and how can I control the number of mapper while running MR on top of the HBase table. Can someone guide me here. Using hbase 0.98.4 and hdoop 2.6 version.

Enis · ‎11-03-2016

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

View solution in original post

Enis · ‎11-03-2016

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

Cloudera Community

Support Questions

MapReduce performance on the HBase input table.