Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How can you make the row_id the primary key in a hive table?,how can you make row_id the primary key in HIVE?

avatar
 
1 ACCEPTED SOLUTION

avatar
Expert Contributor
3 REPLIES 3

avatar
Expert Contributor

avatar
@icocio

Thank you

avatar

Hi, I found another way of doing this:

1. I first loaded my data set in HDFS. The data set contained the following columns: rwid, ctrname, clndrdate and clndrmonth.

Note that column rwid had no values.

2. Then i created an external table that maps to this data set in hdfs

CREATE EXTERNAL TABLE IF NOT EXISTS calendar(rwid int, ctrname string, clndrdate DATE, clndrmonth string ) COMMENT 'Calendar for Non Business Days' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '<location of my file in hdfs>';

3. I created an ORC

CREATE TABLE IF NOT EXISTS calendar_nbd(rwid int, ctrname string, clndrdate DATE, clndrmonth string ) COMMENT 'Calendar for Non Business Days' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC;

4. The last step is most important. I used row_number() over() in the insert overwrite query. This automatically updated the rwid column with the row number.

insert overwrite table calendar_nbd select row_number() over () as rwid, ctrname,clndrdate, clndrmonth from calendarnonbusdays;