Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive - Deciding the number of buckets

Solved Go to solution

Re: Hive - Deciding the number of buckets

Yes that is the point. You still get 30 files per partition. Which should be fine. But loading a couple terabyte with 30 reducers will take forever. ( In contrast to loading a single partition where 30 writers would be plenty )

Re: Hive - Deciding the number of buckets

New Contributor

Hello @Benjamin,

What would be the best way to convert a very large non-bucketed table into a bucketed table (8 buckets and date wise partitioned table, say 1700 partitions currently).

Every time I run my query to insert.

INSERT INTO NEW_BUCKETED_TABLE PARTITION(DATE_KEY)SELECT ALL_COLUMNS,PARTITION_COLUMN FROM NON_BUCKETD_TABLE;

It always creates no. of reducers = no. of buckets, and start failing after some time as size of reducers is limited, and data is very large.

How to tackle this problem.

Thanks,

Manu

Re: Hive - Deciding the number of buckets

New Contributor

@Benjamin Leonhardi @Manu Batham

We are also facing the same problem, is there any solution to this.

,

@Benjamin @Manu Batham

I am also facing similar problem, is there any solution for this.

Re: Hive - Deciding the number of buckets

New Contributor

optimal bucket number is ( B * HashTableSize of Table ) / Total Memory of Node, where B=1.01