Support Questions

sagarshimpi · ‎12-14-2017

Doing select count on table having 3 lac/more records takes more than 5 mins. Is there any tuning document for hive we can refer to optimize query performance ?

rajkumar_singh · ‎12-14-2017

if you are running query on tez then this will be useful to follow https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer....

sagarshimpi · ‎12-14-2017

@Rajkumar Singh

Thank you. I will check and revert back.

ashneesharma88 · ‎12-18-2017

@sindhu

Can you please help on this.

I tried to run analyze it takes around 217 second. There is aroung 38 lakh's records.

analyze table schema.table compute statistics for columns; Query ID = ec2-user_20171218003632_b89c66b2-2484-41b3-8d11-d0559e2b3ff7 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1513235262783_0081, Tracking URL = http://ip-192-168-180-54.ca-central-1.compute.internal:8088/proxy/application_1513235262783_0081/ Kill Command = /usr/hdp/2.6.2.14-5/hadoop/bin/hadoop job -kill job_1513235262783_0081 Hadoop job information for Stage-0: number of mappers: 3; number of reducers: 1 2017-12-18 00:36:41,824 Stage-0 map = 0%, reduce = 0% 2017-12-18 00:37:42,572 Stage-0 map = 0%, reduce = 0%, Cumulative CPU 189.89 sec 2017-12-18 00:38:32,943 Stage-0 map = 33%, reduce = 0%, Cumulative CPU 352.13 sec 2017-12-18 00:38:37,056 Stage-0 map = 67%, reduce = 0%, Cumulative CPU 359.55 sec 2017-12-18 00:38:43,223 Stage-0 map = 67%, reduce = 22%, Cumulative CPU 366.29 sec 2017-12-18 00:39:43,898 Stage-0 map = 67%, reduce = 22%, Cumulative CPU 428.53 sec 2017-12-18 00:40:06,476 Stage-0 map = 100%, reduce = 22%, Cumulative CPU 454.93 sec 2017-12-18 00:40:07,503 Stage-0 map = 100%, reduce = 67%, Cumulative CPU 455.44 sec 2017-12-18 00:40:08,526 Stage-0 map = 100%, reduce = 100%, Cumulative CPU 457.46 sec MapReduce Total cumulative CPU time: 7 minutes 37 seconds 460 msec Ended Job = job_1513235262783_0081 MapReduce Jobs Launched: Stage-Stage-0: Map: 3 Reduce: 1 Cumulative CPU: 457.46 sec HDFS Read: 688303 HDFS Write: 2067 SUCCESS Total MapReduce CPU Time Spent: 7 minutes 37 seconds 460 msec OK Time taken: 216.908 seconds

I tried to run count qury after analyze stil its take 5 min. Can you please help to tune hive so it work faster. We use map reduce engine.

Query ID = ec2-user_20171218004512_6bef2ddb-d981-42f8-b2e5-c42a9ad80bfd
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1513235262783_0082, Tracking URL = http://ip-192-168-180-54.ca-central-1.compute.internal:8088/proxy/application_1513235262783_0082/
Kill Command = /usr/hdp/2.6.2.14-5/hadoop/bin/hadoop job  -kill job_1513235262783_0082
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-12-18 00:45:22,022 Stage-1 map = 0%,  reduce = 0%
2017-12-18 00:46:22,804 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 189.83 sec
2017-12-18 00:46:43,363 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 251.69 sec
2017-12-18 00:46:46,436 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 259.47 sec
2017-12-18 00:46:54,666 Stage-1 map = 67%,  reduce = 22%, Cumulative CPU 269.17 sec
2017-12-18 00:47:49,101 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 328.79 sec
MapReduce Total cumulative CPU time: 5 minutes 28 seconds 790 msec
Ended Job = job_1513235262783_0082
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 328.79 sec   HDFS Read: 330169 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 5 minutes 28 seconds 790 msec
OK
3778700

ssubhas · ‎12-19-2017

@Ashnee Sharma

Try below parameters and then run select count(*):

set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.fetch.task.conversion=more

ashneesharma88 · ‎12-21-2017

@Sindhu

I have tried with above values.

Still taking time. There is aroung 38 lakhs records.

Query ID = ec2-user_20171221001453_85ebceae-ab99-4a88-aadf-8f3eb6d05fdb Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1513772289178_0006, Tracking URL = http://ip-192-168-180-54.ca-central-1.compute.internal:8088/proxy/application_1513772289178_0006/ Kill Command = /usr/hdp/2.6.2.14-5/hadoop/bin/hadoop job -kill job_1513772289178_0006 Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1 2017-12-21 00:15:10,576 Stage-1 map = 0%, reduce = 0% 2017-12-21 00:16:11,542 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 192.03 sec 2017-12-21 00:16:34,173 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 260.89 sec 2017-12-21 00:16:38,289 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 270.02 sec 2017-12-21 00:16:45,514 Stage-1 map = 67%, reduce = 22%, Cumulative CPU 279.97 sec 2017-12-21 00:17:35,937 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 330.59 sec 2017-12-21 00:17:36,967 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 332.77 sec MapReduce Total cumulative CPU time: 5 minutes 32 seconds 770 msec Ended Job = job_1513772289178_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 3 Reduce: 1 Cumulative CPU: 332.77 sec HDFS Read: 322983 HDFS Write: 8 SUCCESS Total MapReduce CPU Time Spent: 5 minutes 32 seconds 770 msec OK 3778700

balavignesh_nag · ‎12-20-2017

Hi @Ashnee Sharma

Based on the logs I could see that when you run a count query it triggers a mapreduce job and it takes time. Could you check running this command (set hive.stats.fetch.column.stats) and verify that it status is true?

Because when this property is enable then stats should be fetched based on stats information available in the metastore which will not trigger any jobs when you run a count query. It should work regardless whether you are using mr/tez as your execution engine.

Hope it helps!!

ashneesharma88 · ‎12-21-2017

@Bala Vignesh N V

Tried but still result is same.

gsrao_cse · ‎12-22-2017

if you don't mine, Can you place the query which you are using to get count

ashneesharma88 · ‎12-26-2017

@srinivasa rao

Query is simple. select count(*) from rasdb.test;

in this table 3800000 record.

Cloudera Community

Support Questions

select count query taking more time