MapReduce performance on the HBase input table.

arunkumar_d — Thu, 03 Nov 2016 19:48:17 GMT

While running a MR on the HDFS file, The # of mappers is based on the input split size. While it is usually true that the input split size will match the block size with a text input file in hdfs. ( understand not always the case but most of the time)

What will happen when I run a MR on the hbase table as a input record. How does the input split size match with the hbase table and how can I control the number of mapper while running MR on top of the HBase table. Can someone guide me here. Using hbase 0.98.4 and hdoop 2.6 version.

Re: MapReduce performance on the HBase input table.

Enis — Fri, 04 Nov 2016 00:56:32 GMT

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

question Re: MapReduce performance on the HBase input table. in Archives of Support Questions (Read Only)

MapReduce performance on the HBase input table.

Re: MapReduce performance on the HBase input table.