Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sequence number generation in impala

avatar
Rising Star

I want to generate sequence number for each new record in Impala.

 

1. First load - am loading 100 records with sequence number starting from 1 to 100.

2. Second load - i need to load new 10 records which are not present in first load . So my sequence number has to start from 101 to 110 for these new to records.

 

I think there is no sequence function here and Next value cannot be used to hold the max value.

Is this possible with any other function or UDFs has to be written ?

1 ACCEPTED SOLUTION

avatar

The Hive UDF won't help you. If you look at the Hive issue tracker https://issues.apache.org/jira/browse/HIVE-1304, row_sequence() was added as a workaround because they didn't support the row_number() analytic function at that point in time.

 

We support the row_number() analytic function in Impala, so there's no reason to try to use that UDF.

 

If you want to start froma particular number, can't you just add it like I suggested in my previous answer?

View solution in original post

9 REPLIES 9

avatar

You could possibly use the ROW_NUMBER() analytic function as part of the solution.

 

http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_analytic_functions.html...

 

 

E.g.

 

select t2.last_id + 1 + row_number() over (order by table1.col1), table1.col1

from table1

   inner join (select max(id) last_id from table2) t2

 

This wouldn't be safe if you are running multiple insert concurrently.

avatar
Rising Star

Thanks Tim for answering it. I tried using UDFrowsequence java code and it always start from 1. 

 

http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRo...

 

Is there a way to start from max value by passing it ?

 

 

avatar

The Hive UDF won't help you. If you look at the Hive issue tracker https://issues.apache.org/jira/browse/HIVE-1304, row_sequence() was added as a workaround because they didn't support the row_number() analytic function at that point in time.

 

We support the row_number() analytic function in Impala, so there's no reason to try to use that UDF.

 

If you want to start froma particular number, can't you just add it like I suggested in my previous answer?

avatar
Rising Star

Yes Tim. That works. Here we need to join target table (table2). 

 

If CREATE SEQUENCE function is there in impala, it would be of more useful instead of doing these tricky things.

 

Thanks for your reply.

 

avatar
Explorer

Hi Tim,

 

Thanks for the solution but i'm not able to understand it prorperly. Can you provide a query with the below issue description. I tried using your solution query but wasn't able to map the col's. So i have a table buyers and its has one col 'id' for which every time an insert happens i want that col to get incremental value's for each insert.

Table Buyers

Col: id,name,created on

Can you provide a sample query for this table. It will be really helpful to me as i'm stcuk on this from long time.

avatar
Rising Star
does it have to be a sequence? or would a unique value be sufficient? If that's the case Impala's got a uuid() function that you can use. Or if a BIGINT is required you can hash the uuid() to get a BIGINT value.

avatar
Contributor
 

avatar

avatar
New Contributor

Try using this hive UDF in Impala: https://github.com/manojkumarvohra/hive-hilo