Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

what happens when we use bucket in our data to get quick result?

Highlighted

what happens when we use bucket in our data to get quick result?

New Contributor

Hi All,

Currently I am learning Hive and I came accross a topic called 'Bucketing' which is used to increase the performance in any project. Can anyone here explain me on what circumstences we should go for bucketing? How to define the number of bucket? One of my senior told me that we can use any no of bucket in a project/filesystem data.so in that case if we have large no of buckets,will that help to enhance the perfomance or delay.

Here is one senario, suppose we have very very large number of data and we have used both partition and bucketing for quick data fetching.while bucketing, will it go and search each and every bucket(if large no. of buckets used)? Does that result in slowdown the process of fetching the data?

please do share your knowldege on the above.happy learning!:)