Reply
Highlighted
Contributor
Posts: 29
Registered: ‎02-11-2019

Generating Unique ID using Zookeeper

Hi all.

Need to generate unique id's in our hadoop cluster during data ingestion.

 

We have parallel processes ingesting data from different sources into hive tables, we'd like a unique ID for each data row inserted.

 

I understand zookeper offers Unique ID generation for distributed scenarios.

 

Please help with how do we do this, can't find sample of documentation.

 

Also please let me know If there is a better distributed unique id generator in the cloudera environment

 

Thanks

Posts: 1,903
Kudos: 435
Solutions: 305
Registered: ‎07-31-2013

Re: Generating Unique ID using Zookeeper

Are you looking for a sequentially growing ID or just a universally unique ID?

For the former, you can use Curator over ZooKeeper with this recipe: https://curator.apache.org/curator-recipes/distributed-atomic-long.html

For the latter, a UUID generator may suffice.

For a more 'distributed' solution, checkout Twitter's Snowflake: https://github.com/twitter-archive/snowflake/tree/snowflake-2010