Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can I create Primary Key in Hive table? I saw in TBLPROPERTIES you can mention "PRIMARY KEY"="col_name", what it actually does?

avatar
 
1 ACCEPTED SOLUTION

avatar

@Abdus Sagir Mollah the designation of primary key is simply metadata describing the column. It doesn't apply any referential constraints.

View solution in original post

3 REPLIES 3

avatar

@Abdus Sagir Mollah the designation of primary key is simply metadata describing the column. It doesn't apply any referential constraints.

avatar
@Abdus Sagir Mollah

Primary keys can also be useful for bucketing (i.e. paritioning of data) especially if you are trying to leverage the ACID capabilities of Hive.

Quote from the below blog:

  • Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys.

Entire blog: http://hortonworks.com/blog/adding-acid-to-apache-hive/

avatar

@Andrew Watson - The ACID properties have been taken back by the community. It is not recommended for customer use currently.