Yes that is the point. You still get 30 files per partition. Which should be fine. But loading a couple terabyte with 30 reducers will take forever. ( In contrast to loading a single partition where 30 writers would be plenty )
What would be the best way to convert a very large non-bucketed table into a bucketed table (8 buckets and date wise partitioned table, say 1700 partitions currently).
Every time I run my query to insert.
INSERT INTO NEW_BUCKETED_TABLE PARTITION(DATE_KEY)SELECT ALL_COLUMNS,PARTITION_COLUMN FROM NON_BUCKETD_TABLE;
It always creates no. of reducers = no. of buckets, and start failing after some time as size of reducers is limited, and data is very large.
How to tackle this problem.