Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Expert Contributor

PROBLEM

When we query hbase tables through hive, it always creates a fetch task instead of running a MR task. The parameterhive.fetch.task.conversion.threshold controls whether a fetch task would run or a Map Reduce. If hive.fetch.task.conversion.thresholdis less than the table size, it will use MapReduce Job. The default value of the above parameter is 1GB.

Create a 'hbase_hive' external table in hive, make sure the hbase table is more than 1GB.

[root@node1 ~]# hadoop fs -du -s -h /apps/hbase/data/data/default/hbase-hive 
3.4 G /apps/hbase/data/data/default/hbase-hive

From beeline analyze the explain plan, which launches a fetch task instead of Map Reduce job, even when the size of the table is more than 1GB

0: jdbc:hive2://node1.hwxblr.com:10000/> explain select * from hbase_hive where key = '111111A111111' ; 

+----------------------------------------------------------------------------------------------------------+--+ 

| Explain | 

+----------------------------------------------------------------------------------------------------------+--+ 

| STAGE DEPENDENCIES: | 

| Stage-0 is a root stage | 

| | 

| STAGE PLANS: | 

| Stage: Stage-0 | 

| Fetch Operator | 

| limit: -1 |

ROOT CAUSE

The reason for this behavior is that the fetch task conversion means initiate a local task (inside the client itself) instead of submitting a job to the cluster. For Hive on Hbase table, it does not have any stats and hence the return value would always be less than the fetch task conversion size and would launch the local task at client side.

RESOLUTION

Query the table by setting the hive.fetch.task.conversion to 'minimal' before executing the query for Hive hbase tables. Do not set this property permanently in hive-site.xml to 'minimal'.

922 Views