Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Can I create Primary Key in Hive table? I saw in TBLPROPERTIES you can mention "PRIMARY KEY"="col_name", what it actually does?

avatar
New Member
 
1 ACCEPTED SOLUTION

avatar

@Abdus Sagir Mollah the designation of primary key is simply metadata describing the column. It doesn't apply any referential constraints.

View solution in original post

3 REPLIES 3

avatar

@Abdus Sagir Mollah the designation of primary key is simply metadata describing the column. It doesn't apply any referential constraints.

avatar
@Abdus Sagir Mollah

Primary keys can also be useful for bucketing (i.e. paritioning of data) especially if you are trying to leverage the ACID capabilities of Hive.

Quote from the below blog:

  • Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys.

Entire blog: http://hortonworks.com/blog/adding-acid-to-apache-hive/

avatar

@Andrew Watson - The ACID properties have been taken back by the community. It is not recommended for customer use currently.