Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MapReduce performance on the HBase input table.

Solved Go to solution

MapReduce performance on the HBase input table.

New Contributor

While running a MR on the HDFS file, The # of mappers is based on the input split size. While it is usually true that the input split size will match the block size with a text input file in hdfs. ( understand not always the case but most of the time)

What will happen when I run a MR on the hbase table as a input record. How does the input split size match with the hbase table and how can I control the number of mapper while running MR on top of the HBase table. Can someone guide me here. Using hbase 0.98.4 and hdoop 2.6 version.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: MapReduce performance on the HBase input table.

Guru

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.

1 REPLY 1
Highlighted

Re: MapReduce performance on the HBase input table.

Guru

TableInputFormat used in HBase will create 1 map task per table region. The data size will depend on how big your regions are.