Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

select count(*) does not have an entry in resourcemanager

select count(*) does not have an entry in resourcemanager

From a table that has more than 3.9 Billion rows, I performed select count(*) and a select statement with a where condition as mentioned below, but this does not have an entry in yarn logs.

It should kick off a MR job and I should see some entries in logs, but I do not see so. Want to confirm if the mentioned 2 operations do kick of a MR job or NOT?

Select count(*) from xyz; 3989090043;

Select * from xyz where id=2253998123456607;

Thank you for your help in advance!

6 REPLIES 6
Highlighted

Re: select count(*) does not have an entry in resourcemanager

@Lakhman Pervatoju

Try as below

Select count(*) from xyz limit 2; and see if that solves your problem.

Technically it gathers information from table status, it doesn't trigger any MR job unless there is real aggrigation.

Highlighted

Re: select count(*) does not have an entry in resourcemanager

Sridhar,

Can you please check my comment.

Highlighted

Re: select count(*) does not have an entry in resourcemanager

Expert Contributor

@Lakhman Pervatoju The simple answer is Yes.

But, if you are using LLAP(im memory processing) then you may not see a new job in RM because it uses already running LLAP jobs, which keeps running in memory. If you are using Ambari View 2 then it runs a job first time and 2nd time it gets data from cache.

Highlighted

Re: select count(*) does not have an entry in resourcemanager

@Sridhar Reddy

Select count(*) from xyz limit 2;

This did kick off a job and made an entry in yarn logs.

From table status may be the first query can be processed with out triggering any jobs "select count(*)"

But why even while filtering a table by a where clause (in the 2nd query) does not trigger any MR jobs?

Highlighted

Re: select count(*) does not have an entry in resourcemanager

@Manish

We do not have LLAP configured

Highlighted

Re: select count(*) does not have an entry in resourcemanager

Cloudera Employee
@Lakhman Pervatoju

The reason for a MR job not being triggered while filtering a table is because for simple queries Hive does a direct fetch operation from HDFS which is much more faster/efficient than triggering a MR job. This is controlled by the property hive.fetch.task.conversion

Below is a detailed description of this property:

hive.fetch.task.conversion
Default Value: minimal in Hive 0.10.0 through 0.13.1, more in Hive 0.14.0 and laterAdded In: Hive 0.10.0 with HIVE-2925; default changed in Hive 0.14.0 with HIVE-7397Some select queries can be converted to a single FETCH task, minimizing latency. Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incur RS – ReduceSinkOperator, requiring a MapReduce task), lateral views and joins.
Supported values are none, minimal and more.
0. none:  Disable hive.fetch.task.conversion (value added in Hive 0.14.0 with HIVE-8389)
1. minimal:  SELECT *, FILTER on partition columns (WHERE and HAVING clauses), LIMIT only
2. more:  SELECT, FILTER, LIMIT only (including TABLESAMPLE, virtual columns)
"more" can take any kind of expressions in the SELECT clause, including UDFs.
(UDTFs and lateral views are not yet supported – see HIVE-5718.)
Don't have an account?
Coming from Hortonworks? Activate your account here