Reply
Explorer
Posts: 15
Registered: ‎06-06-2017
Accepted Solution

Any work round to avoid duplicate records in impala for Primary key column

Appreciate if any work round to avoid duplicate records in impala for Primary key column.

Champion
Posts: 424
Registered: ‎05-16-2016

Re: Any work round to avoid duplicate records in impala for Primary key column

Are you asking pertain to inseration or reterival of data ?

Explorer
Posts: 15
Registered: ‎06-06-2017

Re: Any work round to avoid duplicate records in impala for Primary key column

thinking of avoidng duplicates while insertion if this won't cause performacne issue.

Highlighted
Posts: 342
Topics: 11
Kudos: 48
Solutions: 28
Registered: ‎09-02-2016

Re: Any work round to avoid duplicate records in impala for Primary key column

@Msdhan

 

https://www.cloudera.com/documentation/enterprise/5-3-x/topics/impala_porting.html

 

According to the above link, Take out any CREATE INDEXDROP INDEX, and ALTER INDEX statements, and equivalent ALTER TABLEstatements. Remove any INDEXKEY, or PRIMARY KEY clauses from CREATE TABLE and ALTER TABLE statements. Impala is optimized for bulk read operations for data warehouse-style queries, and therefore does not support indexes for its tables.

 

Yes in general, you cannot achieve both Performance and Indexing. So if possible, you can try to control duplicate in the source (select) portion instead of target (insert) portion

 

Ex:

insert into table trg_table 

select * from src_table

 

 

 

Champion
Posts: 424
Registered: ‎05-16-2016

Re: Any work round to avoid duplicate records in impala for Primary key column

Impala does not have a concept of PK .However You have two options 

down the road if you want to implement delete single row you cant perform them on Hive / Impala . So you can implement using Impala-kudu format . Kudu format you can create table with primary key , plus you perform single row delete. 

 

or the hard way to achive this is to 

 

 

STEP 1

CREATE TABLE Sample ( name STRING, street STRING, RD123 Timestamp ,(Assume this is unique since we dont have Pk) ) STEP 2
Perform the LOAD DATA INTO Sample
STEP 3 - Create another table
Create table sample_no_dupli AS select SELECT col1,col2,MAX(RD123) AS createdate FROM JLT_STAHING GROUP BY name,street

 

 

 

Explorer
Posts: 15
Registered: ‎06-06-2017

Re: Any work round to avoid duplicate records in impala for Primary key column

Thanks Saranvisa for this explanation
Explorer
Posts: 15
Registered: ‎06-06-2017

Re: Any work round to avoid duplicate records in impala for Primary key column

csguna, appreciate your inputs. will try this.

Champion
Posts: 424
Registered: ‎05-16-2016

Re: Any work round to avoid duplicate records in impala for Primary key column

@Msdhan You Welcome :)) 

Announcements