Created 02-29-2016 12:57 PM
I want to generate sequence number for each new record in hive.
1. First load - am loading 100 records with sequence number starting from 1 to 100.
2. Second load - i need to load new 10 records which are not present in first load . So my sequence number has to start from 101 to 110 for these new to records.
I think there is no sequence function here and Next value cannot be used to hold the max value.
Is this possible with any other function or UDFs has to be written ?
Created 02-29-2016 01:19 PM
It is almost impossible to get a steadily increasing not interrupted sequence number in a parallel data warehouse like Hive.
There is the following UDF but you would have to restrict the number of mappers to 1 to make this work.
There are some tricks like Accumulators in Spark for example that can do similar things ( give unique numbers at least if not steadily increasing ) but not in Hive as far as I know.
Created 02-29-2016 05:31 PM
Generating a sequential order in a distributed processing query is not possible with simple UDFs. This is because the approach will require some centralised entity to keep track of the counter, which will also result in severe inefficiency for distributed queries and is not recommended to apply.
Created 04-30-2016 02:18 PM
I have written an implementation for a distributed sequence in here:
Created 12-15-2016 04:01 PM
Dear All,
I need to generate the surrogate keys(which are sequential) in HIVE.I do not want to go and use the method (java.util.UUID) since it generates 33 bytes of lenghth and i do not want that much length(Though it was unique).So predominantly i can do that in 2 ways if i am not wrong:
1st Method:
To restrict the number of mappers to 1 and use the code for random unique generation of UDF in HIVE.The code repository as mentioned in the above post.
2nd Method:
**** Please Note : I think we cannot restrict the number of mappers to 1 and hence both the methods fail for the creation of unique keys. ****
Please let me know if there is a way to do this?
Thanks,
Lalit
Created 12-15-2016 04:01 PM
Hi Manoj-I saw your post and are you sure that your code will work?
Created 12-15-2016 04:07 PM
Yes, it does works. One of our clients made use of it in production.
Created 12-15-2016 04:02 PM
The number of mappers cannot be restricted to 1 since the number of mappers depend on the data which is number of input splits.
Created 12-15-2016 04:39 PM
Are you talking about the sequence number generation in HIVE? And you are telling it can be done with large datasets?
By the way can you please send me your email or contact number?
Thanks
Created 12-15-2016 04:43 PM
Yep, its for hive;)
Give it a try as per instructions mentioned on my github page.
If you face any problems shoot a mail on manojkumarvohra9@gmail.com