Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can you make the row_id the primary key in a hive table?,how can you make row_id the primary key in HIVE?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
Expert Contributor
3 REPLIES 3

avatar
Expert Contributor

avatar
Contributor
@icocio

Thank you

avatar
Contributor

Hi, I found another way of doing this:

1. I first loaded my data set in HDFS. The data set contained the following columns: rwid, ctrname, clndrdate and clndrmonth.

Note that column rwid had no values.

2. Then i created an external table that maps to this data set in hdfs

CREATE EXTERNAL TABLE IF NOT EXISTS calendar(rwid int, ctrname string, clndrdate DATE, clndrmonth string ) COMMENT 'Calendar for Non Business Days' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '<location of my file in hdfs>';

3. I created an ORC

CREATE TABLE IF NOT EXISTS calendar_nbd(rwid int, ctrname string, clndrdate DATE, clndrmonth string ) COMMENT 'Calendar for Non Business Days' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC;

4. The last step is most important. I used row_number() over() in the insert overwrite query. This automatically updated the rwid column with the row number.

insert overwrite table calendar_nbd select row_number() over () as rwid, ctrname,clndrdate, clndrmonth from calendarnonbusdays;