Support Questions

Find answers, ask questions, and share your expertise

How do you speed up count(*) on tables in Hive

avatar
Master Guru

I have a table with a few thousand rows in a small dev cluster and I am running a count on a ORC table and Tez is enabled.

The count takes a really really really long time or times out.

Any suggestions?

8 REPLIES 8

avatar
Master Guru

@Timothy Spann counts on ORC tables should be fast as it can use the strip footer info and run much faster. Have you run stats on the table?

avatar
Super Guru

@Timothy Spann

Have you run analyze table <tablename> compute statistics for columns?

https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

avatar
Guru

@Timothy Spann

Can you help with the explain plan along with hive -hiveconf hive.root.logger=debug,console -e 'query' output

avatar
Contributor

Hi have you solve this problem? I am also having same problem as yours.

avatar
Master Mentor

@Christian Lunesa

If you are only interested in the number and not to display all lines in the table then try

select count(1) from table_name;

That should be faster, hope that helps !!!

avatar
Master Mentor

@Christian Lunesa

If you are only interested in the number and not to display all lines in the table then try

select count(1) from table_name;

That should be faster, hope that helps !!!

avatar
Rising Star

@Geoffrey Shelton Okot it is also taking the same time. I run query select count(ayx) from test_table. It took 17 Sec. and while i run select count(1) from test_table, the time remains same.

Could you please explore more about hive query fast processing.

Thanks

avatar
New Contributor

This is an old thread, but for anyone looking at it, count(1) is the same query as count(*), so there are no performance benefits to using one over the other.