Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

select count(*) does not have an entry in resourcemanager

From a table that has more than 3.9 Billion rows, I performed select count(*) and a select statement with a where condition as mentioned below, but this does not have an entry in yarn logs.

It should kick off a MR job and I should see some entries in logs, but I do not see so. Want to confirm if the mentioned 2 operations do kick of a MR job or NOT?

Select count(*) from xyz; 3989090043;

Select * from xyz where id=2253998123456607;

Thank you for your help in advance!

6 REPLIES 6

@Lakhman Pervatoju

Try as below

Select count(*) from xyz limit 2; and see if that solves your problem.

Technically it gathers information from table status, it doesn't trigger any MR job unless there is real aggrigation.

Sridhar,

Can you please check my comment.

Expert Contributor

@Lakhman Pervatoju The simple answer is Yes.

But, if you are using LLAP(im memory processing) then you may not see a new job in RM because it uses already running LLAP jobs, which keeps running in memory. If you are using Ambari View 2 then it runs a job first time and 2nd time it gets data from cache.

@Sridhar Reddy

Select count(*) from xyz limit 2;

This did kick off a job and made an entry in yarn logs.

From table status may be the first query can be processed with out triggering any jobs "select count(*)"

But why even while filtering a table by a where clause (in the 2nd query) does not trigger any MR jobs?

@Manish

We do not have LLAP configured

Cloudera Employee
@Lakhman Pervatoju

The reason for a MR job not being triggered while filtering a table is because for simple queries Hive does a direct fetch operation from HDFS which is much more faster/efficient than triggering a MR job. This is controlled by the property hive.fetch.task.conversion

Below is a detailed description of this property:

hive.fetch.task.conversion
Default Value: minimal in Hive 0.10.0 through 0.13.1, more in Hive 0.14.0 and laterAdded In: Hive 0.10.0 with HIVE-2925; default changed in Hive 0.14.0 with HIVE-7397Some select queries can be converted to a single FETCH task, minimizing latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incur RS – ReduceSinkOperator, requiring a MapReduce task), lateral views and joins.
Supported values are none, minimal and more.
0. none:  Disable hive.fetch.task.conversion (value added in Hive 0.14.0 with HIVE-8389)
1. minimal:  SELECT *, FILTER on partition columns (WHERE and HAVING clauses), LIMIT only
2. more:  SELECT, FILTER, LIMIT only (including TABLESAMPLE, virtual columns)
"more" can take any kind of expressions in the SELECT clause, including UDFs.
(UDTFs and lateral views are not yet supported – see HIVE-5718.)