- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Sequence number generation in Hive
- Labels:
-
Apache Hive
Created ‎02-29-2016 12:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to generate sequence number for each new record in hive.
1. First load - am loading 100 records with sequence number starting from 1 to 100.
2. Second load - i need to load new 10 records which are not present in first load . So my sequence number has to start from 101 to 110 for these new to records.
I think there is no sequence function here and Next value cannot be used to hold the max value.
Is this possible with any other function or UDFs has to be written ?
Created ‎02-29-2016 01:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is almost impossible to get a steadily increasing not interrupted sequence number in a parallel data warehouse like Hive.
There is the following UDF but you would have to restrict the number of mappers to 1 to make this work.
There are some tricks like Accumulators in Spark for example that can do similar things ( give unique numbers at least if not steadily increasing ) but not in Hive as far as I know.
Created ‎02-29-2016 05:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Generating a sequential order in a distributed processing query is not possible with simple UDFs. This is because the approach will require some centralised entity to keep track of the counter, which will also result in severe inefficiency for distributed queries and is not recommended to apply.
Created ‎04-30-2016 02:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have written an implementation for a distributed sequence in here:
Created ‎12-15-2016 04:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
I need to generate the surrogate keys(which are sequential) in HIVE.I do not want to go and use the method (java.util.UUID) since it generates 33 bytes of lenghth and i do not want that much length(Though it was unique).So predominantly i can do that in 2 ways if i am not wrong:
1st Method:
To restrict the number of mappers to 1 and use the code for random unique generation of UDF in HIVE.The code repository as mentioned in the above post.
2nd Method:
- 1)Load data to a temp hive table
- 2)Lets get the max value in the table
- 3)Lets use row_number over() to generate row_number+1 and load to actual table.
**** Please Note : I think we cannot restrict the number of mappers to 1 and hence both the methods fail for the creation of unique keys. ****
Please let me know if there is a way to do this?
Thanks,
Lalit
Created ‎12-15-2016 04:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Manoj-I saw your post and are you sure that your code will work?
Created ‎12-15-2016 04:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it does works. One of our clients made use of it in production.
Created ‎12-15-2016 04:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The number of mappers cannot be restricted to 1 since the number of mappers depend on the data which is number of input splits.
Created ‎12-15-2016 04:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you talking about the sequence number generation in HIVE? And you are telling it can be done with large datasets?
By the way can you please send me your email or contact number?
Thanks
Created ‎12-15-2016 04:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep, its for hive;)
Give it a try as per instructions mentioned on my github page.
If you face any problems shoot a mail on manojkumarvohra9@gmail.com
