Archives of Support Questions (Read Only)

nandini_bhattac · ‎02-07-2017

icocio · ‎02-07-2017

@Nandini Bhattacharjee

You might want to take a look at the following HCC post

https://community.hortonworks.com/questions/22321/can-i-create-primary-key-in-hive-table-i-saw-in-tb...

View solution in original post

icocio · ‎02-07-2017

@Nandini Bhattacharjee

You might want to take a look at the following HCC post

https://community.hortonworks.com/questions/22321/can-i-create-primary-key-in-hive-table-i-saw-in-tb...

nandini_bhattac · ‎02-09-2017

@icocio

Thank you

nandini_bhattac · ‎02-17-2017

Hi, I found another way of doing this:

1. I first loaded my data set in HDFS. The data set contained the following columns: rwid, ctrname, clndrdate and clndrmonth.

Note that column rwid had no values.

2. Then i created an external table that maps to this data set in hdfs

CREATE EXTERNAL TABLE IF NOT EXISTS calendar(rwid int, ctrname string, clndrdate DATE, clndrmonth string ) COMMENT 'Calendar for Non Business Days' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '<location of my file in hdfs>';

3. I created an ORC

CREATE TABLE IF NOT EXISTS calendar_nbd(rwid int, ctrname string, clndrdate DATE, clndrmonth string ) COMMENT 'Calendar for Non Business Days' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC;

4. The last step is most important. I used row_number() over() in the insert overwrite query. This automatically updated the rwid column with the row number.

insert overwrite table calendar_nbd select row_number() over () as rwid, ctrname,clndrdate, clndrmonth from calendarnonbusdays;

Cloudera Community

Archives of Support Questions (Read Only)

How can you make the row_id the primary key in a hive table?,how can you make row_id the primary key in HIVE?