Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Generating Unique ID using Zookeeper

avatar
Contributor

Hi all.

Need to generate unique id's in our hadoop cluster during data ingestion.

 

We have parallel processes ingesting data from different sources into hive tables, we'd like a unique ID for each data row inserted.

 

I understand zookeper offers Unique ID generation for distributed scenarios.

 

Please help with how do we do this, can't find sample of documentation.

 

Also please let me know If there is a better distributed unique id generator in the cloudera environment

 

Thanks

1 REPLY 1

avatar
Mentor
Are you looking for a sequentially growing ID or just a universally unique ID?

For the former, you can use Curator over ZooKeeper with this recipe: https://curator.apache.org/curator-recipes/distributed-atomic-long.html

For the latter, a UUID generator may suffice.

For a more 'distributed' solution, checkout Twitter's Snowflake: https://github.com/twitter-archive/snowflake/tree/snowflake-2010