- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive primary on a partitioned column
- Labels:
-
Apache Hive
Created 06-14-2017 10:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to add primary/foreign key constraints to a hive table. The only think is that my PK is actually a partitioned column. For instance:
CREATE TABLE pk ( id INT, PRIMARY KEY(part) DISABLE NOVALIDATE ) PARTITIONED BY (part STRING)
This fails with the error message:
DBCException: SQL Error [10002] [42000]: Error while compiling statement: FAILED: SemanticException [Error 10002]: Invalid column reference part
Is there a way to use a partitioned column as PK?
Context: hp 2.6, hive 2.1 with llap.
Created 06-15-2017 09:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
partition is viewed as a new column in the table definition hence you can't partition by an already existing field.
as a side note, PK is not working as in standard SGBD in Hive, it's just here for compliance (ie you can't deduplicate fields just by adding a PK)
Created 06-14-2017 12:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A side note: You should not partition on any columns with high cardinality such as IDs. You would use bucketing instead
Created 06-15-2017 09:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
partition is viewed as a new column in the table definition hence you can't partition by an already existing field.
as a side note, PK is not working as in standard SGBD in Hive, it's just here for compliance (ie you can't deduplicate fields just by adding a PK)
