Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

select count query taking more time in hive/tez

New Contributor

when I run a hive query

select count (*) from mytable

it takes a lot of time, for a table of 27 million line, it runs in 30 minutes, i use HDP 2.6.4 with hive 1.2.1000 and tez 0.7.0.


Best regards

3 REPLIES 3

Super Guru

@Abderrahim BOUDI


Good articles regards to tune Hive performance: Hive_performance_tune Tez_Performance_Tune . ExplainPlan

This is too broad question to answer, here are my thoughts:

1.Check is your HiveJob is getting started running in Resource manager(not in queue waiting for resources i.e Accepted state..etc)

2.Check in HDFS how many files are there in the table pointed directory, too many small files will result poor performance. and you need to consolidate all small files into big one's then run the query again.

3.Try running hive console in debug mode to see where the job is taking time to execute.

4.Check is there any skew's in the data and create table stating all these skewed columns in the table properties.

New Contributor
@Shu 

By default, hive.exec.reducers.bytes.per.reducer is set to 64MB, as I hadoop 2.7 should I put to 128MB or put 256MB as indicated in the documentation that you had communicated to me?

New Contributor
@Shu  By default, hive.exec.reducers.bytes.per.reducer is set to 64MB, as I hadoop 2.7 should I put to 128MB or put 256MB as indicated in the documentation that you had communicated to me?
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.