Reply
Explorer
Posts: 13
Registered: ‎02-29-2016
Accepted Solution

Sequence number generation in impala

I want to generate sequence number for each new record in Impala.

 

1. First load - am loading 100 records with sequence number starting from 1 to 100.

2. Second load - i need to load new 10 records which are not present in first load . So my sequence number has to start from 101 to 110 for these new to records.

 

I think there is no sequence function here and Next value cannot be used to hold the max value.

Is this possible with any other function or UDFs has to be written ?

Highlighted
Cloudera Employee
Posts: 318
Registered: ‎07-29-2015

Re: Sequence number generation in impala

You could possibly use the ROW_NUMBER() analytic function as part of the solution.

 

http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_analytic_functions.html...

 

 

E.g.

 

select t2.last_id + 1 + row_number() over (order by table1.col1), table1.col1

from table1

   inner join (select max(id) last_id from table2) t2

 

This wouldn't be safe if you are running multiple insert concurrently.

Explorer
Posts: 13
Registered: ‎02-29-2016

Re: Sequence number generation in impala

[ Edited ]

Thanks Tim for answering it. I tried using UDFrowsequence java code and it always start from 1. 

 

http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRo...

 

Is there a way to start from max value by passing it ?

 

 

Cloudera Employee
Posts: 318
Registered: ‎07-29-2015

Re: Sequence number generation in impala

The Hive UDF won't help you. If you look at the Hive issue tracker https://issues.apache.org/jira/browse/HIVE-1304, row_sequence() was added as a workaround because they didn't support the row_number() analytic function at that point in time.

 

We support the row_number() analytic function in Impala, so there's no reason to try to use that UDF.

 

If you want to start froma particular number, can't you just add it like I suggested in my previous answer?

Explorer
Posts: 13
Registered: ‎02-29-2016

Re: Sequence number generation in impala

[ Edited ]

Yes Tim. That works. Here we need to join target table (table2). 

 

If CREATE SEQUENCE function is there in impala, it would be of more useful instead of doing these tricky things.

 

Thanks for your reply.

 

New Contributor
Posts: 1
Registered: ‎02-15-2017

Re: Sequence number generation in impala

Try using this hive UDF in Impala: https://github.com/manojkumarvohra/hive-hilo

New Contributor
Posts: 2
Registered: ‎04-24-2017

Re: Sequence number generation in impala

Hi Tim,

 

Thanks for the solution but i'm not able to understand it prorperly. Can you provide a query with the below issue description. I tried using your solution query but wasn't able to map the col's. So i have a table buyers and its has one col 'id' for which every time an insert happens i want that col to get incremental value's for each insert.

Table Buyers

Col: id,name,created on

Can you provide a sample query for this table. It will be really helpful to me as i'm stcuk on this from long time.

Contributor
Posts: 25
Registered: ‎06-13-2017

Re: Sequence number generation in impala

does it have to be a sequence? or would a unique value be sufficient? If that's the case Impala's got a uuid() function that you can use. Or if a BIGINT is required you can hash the uuid() to get a BIGINT value.
Explorer
Posts: 8
Registered: ‎09-25-2017

Re: Sequence number generation in impala

[ Edited ]
 
Cloudera Employee
Posts: 318
Registered: ‎07-29-2015

Re: Sequence number generation in impala

Announcements