Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can't we load data into bucketing tables using load statement.Basically the no of reducers determined based on no of buckets.Am i right??But it doesn't work out at loading manually.

Highlighted

Can't we load data into bucketing tables using load statement.Basically the no of reducers determined based on no of buckets.Am i right??But it doesn't work out at loading manually.

New Contributor
 
2 REPLIES 2

Re: Can't we load data into bucketing tables using load statement.Basically the no of reducers determined based on no of buckets.Am i right??But it doesn't work out at loading manually.

Mentor

There is no mention of bucked tables in load manual, load is a simple copy/move statement https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML

Have you seen this, it explains correct way of leveraging bucketed tables

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSorted...

The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

Re: Can't we load data into bucketing tables using load statement.Basically the no of reducers determined based on no of buckets.Am i right??But it doesn't work out at loading manually.

New Contributor

You can load data into a bucketed table, but you as a user have to ensure the number of files is correct, the naming is consistent, and the content of file is properly hashed. Because if any of above is wrong, it will cause undetermined behavior when this table is used in joins where bucketing is considered.

HIVE-15148 safeguards this by introducing a new param which you can use to explicitly disable LOAD DATA on bucketed table.

Don't have an account?
Coming from Hortonworks? Activate your account here