I want to generate sequence number for each new record in hive.
1. First load - am loading 100 records with sequence number starting from 1 to 100.
2. Second load - i need to load new 10 records which are not present in first load . So my sequence number has to start from 101 to 110 for these new to records.
I think there is no sequence function here and Next value cannot be used to hold the max value.
Is this possible with any other function or UDFs has to be written ?
It is almost impossible to get a steadily increasing not interrupted sequence number in a parallel data warehouse like Hive.
There is the following UDF but you would have to restrict the number of mappers to 1 to make this work.
There are some tricks like Accumulators in Spark for example that can do similar things ( give unique numbers at least if not steadily increasing ) but not in Hive as far as I know.
Generating a sequential order in a distributed processing query is not possible with simple UDFs. This is because the approach will require some centralised entity to keep track of the counter, which will also result in severe inefficiency for distributed queries and is not recommended to apply.
I need to generate the surrogate keys(which are sequential) in HIVE.I do not want to go and use the method (java.util.UUID) since it generates 33 bytes of lenghth and i do not want that much length(Though it was unique).So predominantly i can do that in 2 ways if i am not wrong:
To restrict the number of mappers to 1 and use the code for random unique generation of UDF in HIVE.The code repository as mentioned in the above post.
**** Please Note : I think we cannot restrict the number of mappers to 1 and hence both the methods fail for the creation of unique keys. ****
Please let me know if there is a way to do this?
Are you talking about the sequence number generation in HIVE? And you are telling it can be done with large datasets?
By the way can you please send me your email or contact number?
Yep, its for hive;)
Give it a try as per instructions mentioned on my github page.
If you face any problems shoot a mail on email@example.com