Reply
Contributor
Posts: 29
Registered: ‎02-11-2019
Accepted Solution

Get Last Insert in Impala partition

Hi,

 

I need to retrieve only the last entry in a given partition if there are multiple entries therein

 

Assume I create an external table partitioned by date:
create external table test_lb (field1 string, field2 string, field3 string)
  partitioned by (year string, month string , day string, host string)
  row format delimited fields terminated by ','

Then I insert multiple records to same partition. i,e. same year,month,day
insert into test_lb partition (year="2013", month="07", day="28") values ("foo1", "FOO2", "FOO3");
insert into test_lb partition (year="2013", month="07", day="28") values ("foo4", "FOO5", "FOO6");

 

How do I retrieve just the most recent entry via a query... is there an inbuilt way to get only the latest values

Cloudera Employee
Posts: 437
Registered: ‎07-29-2015

Re: Get Last Insert in Impala partition

This isn't possible unless you include a timestamp or sequence number in every record. There's no concept of an order of rows built into Hive or Impala.
Highlighted
Contributor
Posts: 29
Registered: ‎02-11-2019

Re: Get Last Insert in Impala partition

Thanks