Support Questions

ChineduLB · ‎04-02-2019

Hi all.

Need to generate unique id's in our hadoop cluster during data ingestion.

We have parallel processes ingesting data from different sources into hive tables, we'd like a unique ID for each data row inserted.

I understand zookeper offers Unique ID generation for distributed scenarios.

Please help with how do we do this, can't find sample of documentation.

Also please let me know If there is a better distributed unique id generator in the cloudera environment

Thanks

Harsh J · ‎05-08-2019

Are you looking for a sequentially growing ID or just a universally unique ID?

For the former, you can use Curator over ZooKeeper with this recipe: https://curator.apache.org/curator-recipes/distributed-atomic-long.html

For the latter, a UUID generator may suffice.

For a more 'distributed' solution, checkout Twitter's Snowflake: https://github.com/twitter-archive/snowflake/tree/snowflake-2010

Generating Unique ID using Zookeeper