Archives of Support Questions (Read Only)

arunkumar_d · ‎11-03-2016

While running a MR on the HDFS file, The # of mappers is based on the input split size. While it is usually true that the input split size will match the block size with a text input file in hdfs. ( understand not always the case but most of the time)

What will happen when I run a MR on the hbase table as a input record. How does the input split size match with the hbase table and how can I control the number of mapper while running MR on top of the HBase table. Can someone guide me here. Using hbase 0.98.4 and hdoop 2.6 version.

Enis · ‎11-03-2016

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

View solution in original post

Enis · ‎11-03-2016

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

Cloudera Community

Archives of Support Questions (Read Only)

MapReduce performance on the HBase input table.