<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Using java.util.UUID.randomUUID() for UUID generation in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-java-util-UUID-randomUUID-for-UUID-generation/m-p/122723#M26646</link>
    <description>&lt;P&gt;Its a good question, assuming the source of entropy is good the chances of a duplicate are essentially 0 ( randomUUID has 2^122 permutations which is roughly the number of atoms in the universe ) &lt;/P&gt;&lt;P&gt;There are other ways too however, I assume there are some ready made solutions out there but how about using some old fashioned MapReduce:&lt;/P&gt;&lt;P&gt;Just one way: Assuming you could create all the UUIDs in one go and you had the data stored in a delimited format, you could create a unique key based on the long offset provided for each line by Textinputformat.&lt;/P&gt;&lt;P&gt;TextInputFormat provides lines of text together with a long offset ( bytes from the start using the split offsets ), so you could just add this to a starting number ( for example have a batchid that is steadily increased ) and create a unique number that way.&lt;/P&gt;&lt;P&gt; There are definitely other ways to do it too. For example going through a MapReduce jobid + taskid + rowinsplitid.&lt;/P&gt;</description>
    <pubDate>Sat, 30 Apr 2016 00:22:05 GMT</pubDate>
    <dc:creator>bleonhardi</dc:creator>
    <dc:date>2016-04-30T00:22:05Z</dc:date>
    <item>
      <title>Using java.util.UUID.randomUUID() for UUID generation</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-java-util-UUID-randomUUID-for-UUID-generation/m-p/122722#M26645</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;We have a HDP 2.3.2 cluster (around 50 nodes). We have many jobs that process millions of records of data every day (sometimes as high as a billion records a day). We need to assign a unique ID (UUID) for each of these records and are looking to use java.util.UUID.randomUUID() for this. From the documentation and wikipedia we see that randomUUID is good - but there is a very small chance that duplicates can be generated.&lt;/P&gt;&lt;P&gt;I checked the entropy of our machines and they are &amp;gt;150.&lt;/P&gt;&lt;P&gt;While we can be sure that randomUUID will work for now, is there guidance on when *not* to use randomUUID?&lt;/P&gt;&lt;P&gt;We don't want to go to a centralized service for ID generation as that will create bottlenecks.&lt;/P&gt;&lt;P&gt;Are there any other alternatives to generating UUIDs in the hadoop cluster? We have looked at SnowFlake, Flake &amp;amp; FaukxFlake - but are not yet convinced they will work for us.&lt;/P&gt;&lt;P&gt;Any pointers on this will be appreciated.&lt;/P&gt;&lt;P&gt;thanks,&lt;/P&gt;&lt;P&gt;Raga&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2016 21:28:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-java-util-UUID-randomUUID-for-UUID-generation/m-p/122722#M26645</guid>
      <dc:creator>raghavendran_c</dc:creator>
      <dc:date>2016-04-29T21:28:19Z</dc:date>
    </item>
    <item>
      <title>Re: Using java.util.UUID.randomUUID() for UUID generation</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-java-util-UUID-randomUUID-for-UUID-generation/m-p/122723#M26646</link>
      <description>&lt;P&gt;Its a good question, assuming the source of entropy is good the chances of a duplicate are essentially 0 ( randomUUID has 2^122 permutations which is roughly the number of atoms in the universe ) &lt;/P&gt;&lt;P&gt;There are other ways too however, I assume there are some ready made solutions out there but how about using some old fashioned MapReduce:&lt;/P&gt;&lt;P&gt;Just one way: Assuming you could create all the UUIDs in one go and you had the data stored in a delimited format, you could create a unique key based on the long offset provided for each line by Textinputformat.&lt;/P&gt;&lt;P&gt;TextInputFormat provides lines of text together with a long offset ( bytes from the start using the split offsets ), so you could just add this to a starting number ( for example have a batchid that is steadily increased ) and create a unique number that way.&lt;/P&gt;&lt;P&gt; There are definitely other ways to do it too. For example going through a MapReduce jobid + taskid + rowinsplitid.&lt;/P&gt;</description>
      <pubDate>Sat, 30 Apr 2016 00:22:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-java-util-UUID-randomUUID-for-UUID-generation/m-p/122723#M26646</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-04-30T00:22:05Z</dc:date>
    </item>
  </channel>
</rss>

