Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive primary on a partitioned column

avatar
Expert Contributor

I want to add primary/foreign key constraints to a hive table. The only think is that my PK is actually a partitioned column. For instance:

CREATE TABLE pk 
(
  id INT, 
  PRIMARY KEY(part) DISABLE NOVALIDATE
)
PARTITIONED BY (part STRING)

This fails with the error message:

DBCException: SQL Error [10002] [42000]: Error while compiling statement: FAILED: SemanticException [Error 10002]: Invalid column reference part

Is there a way to use a partitioned column as PK?

Context: hp 2.6, hive 2.1 with llap.

1 ACCEPTED SOLUTION

avatar
Guru

@Guillaume Roger

partition is viewed as a new column in the table definition hence you can't partition by an already existing field.

as a side note, PK is not working as in standard SGBD in Hive, it's just here for compliance (ie you can't deduplicate fields just by adding a PK)

View solution in original post

2 REPLIES 2

avatar

A side note: You should not partition on any columns with high cardinality such as IDs. You would use bucketing instead

avatar
Guru

@Guillaume Roger

partition is viewed as a new column in the table definition hence you can't partition by an already existing field.

as a side note, PK is not working as in standard SGBD in Hive, it's just here for compliance (ie you can't deduplicate fields just by adding a PK)