Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive partition and bucketing

avatar
Rising Star

Hi geeks,

I am creatting hive table using below commands

CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING) PARTITIONED BY(timestamp STRING) CLUSTERED BY(user_id) INTO 25 BUCKETS;

on daily basis I am collecting records from mysql to pasting it to HDFS and creating partiton ( using add partition command ).

1. after creating patition dynamically , will 25 files create for bucketing ?

2. what will happen my unique user_id more than 25 (say 40 ) . How this will distribute to 25 buckets ?

1 ACCEPTED SOLUTION

avatar
Super Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Super Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Guru

This should be easy enough for you to test:

1. Insert values 1 to 40 for column user_id into table user_info_bucketed

2. Insert around 440 rows from 41 to 440

3. Ideally, each bucket should have about 19 rows, or around that

4. You can then check something like:

SELECT user_id,INPUT__FILE__NAME FROM user_info_bucketed WHERE user_id = 5;
SELECT user_id,INPUT__FILE__NAME FROM user_info_bucketed WHERE user_id = 50;
SELECT user_id,INPUT__FILE__NAME FROM user_info_bucketed WHERE user_id = 101;
SELECT user_id,INPUT__FILE__NAME FROM user_info_bucketed WHERE user_id = 160;

OR you can check the physical location of the file on HDFS to determine the line count.