Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Functions slow too much my query

avatar
New Contributor

Hello  I'm new to the community and to cloudera/big data in general, 

 

I am having issues with hive performance I have for example a table of 600 records and when I use a select * it runs in .05 seconds but if I use for example a count(*) or any function it runs in like 17 seconds, do any have any tip or trick to check performance or what parameter to check/modify in order to improve this execution time?

 

My enviroment are CDH 6.1.0 withHive 2.1.1-cdh6.1.0

 

Thank you in advance 

 

Ulises Rangel

1 ACCEPTED SOLUTION

avatar

Yes, there are lot of places to check but without knowing what are you looking for you will be lost.

 

You can start with what you see on screen / console where you run the query.

 

In beeline you see tez job summary which has lot of details to look at.

 

example of one of the tuning guide is below

 

https://community.cloudera.com/t5/Community-Articles/Demystify-Apache-Tez-Memory-Tuning-Step-by-Step...

 

 

Update: I see you are using cdh6 which does not have tez.

 

 

You can refer below link for cdh
https://docs.cloudera.com/documentation/enterprise/6/6.1/topics/admin_hive_tuning.html#concept_u51_l...

View solution in original post

4 REPLIES 4

avatar

Hi Ulises,

 

This is expected.

 

When you do select *  without any complex aggregation / function hive can directly read the data from hdfs / files

 

But in case of count it need to do computation which involve creating job and doing the required aggregation which will take time.

 

avatar
New Contributor

Thanks for the reply 

 

I know is a normal thing to happend but is there anything I could check in order to know if there is something wrong with my configuration?  or maybe a job trace, I am a newbie in this topics

avatar

Yes, there are lot of places to check but without knowing what are you looking for you will be lost.

 

You can start with what you see on screen / console where you run the query.

 

In beeline you see tez job summary which has lot of details to look at.

 

example of one of the tuning guide is below

 

https://community.cloudera.com/t5/Community-Articles/Demystify-Apache-Tez-Memory-Tuning-Step-by-Step...

 

 

Update: I see you are using cdh6 which does not have tez.

 

 

You can refer below link for cdh
https://docs.cloudera.com/documentation/enterprise/6/6.1/topics/admin_hive_tuning.html#concept_u51_l...

avatar
New Contributor

Thanks  I will start from there