Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive query on a bucketed table is slow

Highlighted

Hive query on a bucketed table is slow

New Contributor

I have 2 tables.

One is bucketed, the other is not - other than that they are identical.

Both stored as orc and partitioned.

Querying the bucketed table on a given partition is much slower than querying its twin on the same partition.

Is it expected?

If not what can be the reasons for it?

If yes then should I conclude my motivation to hold a bucketed table is to improve joins/improve sampling/allow streaming, and in this case hold 2 copies of the table - one bucketed and the other not (sounds wasteful)?

2 REPLIES 2

Re: Hive query on a bucketed table is slow

Guru

@Michael Belostoky

Bucketing itself was introduced for performance optimization, so that only a portion of data is used especially during map join operations. Thus buckets create smaller chunks of data which can be kept in memory in form of hash tables and then can be used by joins.

What is the query you are executing ?

Re: Hive query on a bucketed table is slow

New Contributor

"select colA,count(*) from my_table where ds="2015-09-18" group by colA"

the "2015-09-18" is a partition on both tables so you would expect similar performance.

Don't have an account?
Coming from Hortonworks? Activate your account here