Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive primary on a partitioned column

Solved Go to solution

Hive primary on a partitioned column

Expert Contributor

I want to add primary/foreign key constraints to a hive table. The only think is that my PK is actually a partitioned column. For instance:

CREATE TABLE pk 
(
  id INT, 
  PRIMARY KEY(part) DISABLE NOVALIDATE
)
PARTITIONED BY (part STRING)

This fails with the error message:

DBCException: SQL Error [10002] [42000]: Error while compiling statement: FAILED: SemanticException [Error 10002]: Invalid column reference part

Is there a way to use a partitioned column as PK?

Context: hp 2.6, hive 2.1 with llap.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hive primary on a partitioned column

Guru

@Guillaume Roger

partition is viewed as a new column in the table definition hence you can't partition by an already existing field.

as a side note, PK is not working as in standard SGBD in Hive, it's just here for compliance (ie you can't deduplicate fields just by adding a PK)

2 REPLIES 2

Re: Hive primary on a partitioned column

A side note: You should not partition on any columns with high cardinality such as IDs. You would use bucketing instead

Re: Hive primary on a partitioned column

Guru

@Guillaume Roger

partition is viewed as a new column in the table definition hence you can't partition by an already existing field.

as a side note, PK is not working as in standard SGBD in Hive, it's just here for compliance (ie you can't deduplicate fields just by adding a PK)

Don't have an account?
Coming from Hortonworks? Activate your account here