Currently I am learning Hive and I came accross a topic called 'Bucketing' which is used to increase the performance in any project. Can anyone here explain me on what circumstences we should go for bucketing? How to define the number of bucket? One of my senior told me that we can use any no of bucket in a project/filesystem data.so in that case if we have large no of buckets,will that help to enhance the perfomance or delay.
Here is one senario, suppose we have very very large number of data and we have used both partition and bucketing for quick data fetching.while bucketing, will it go and search each and every bucket(if large no. of buckets used)? Does that result in slowdown the process of fetching the data?
please do share your knowldege on the above.happy learning!:)