Support Questions

Find answers, ask questions, and share your expertise

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Cloudera Community
- :
- Support
- :
- Support Questions
- :
- The Hash Function over different values gives same...

Announcements

Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

New Contributor

Created 05-03-2017 06:26 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

select hash ('9SH305EJ5'); OK -**339500666**

select hash ('9SH305EIT'); OK -**339500666**

1 ACCEPTED SOLUTION

Accepted Solutions

Contributor

Created 05-03-2017 07:53 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

It is not supposed to generate unique values. The hash() function is working with ranges. It is supposed to index different ranges with integer values. Think about grouping similar ranges of values in a large data set into smaller subsets and have an index to find the respective subset.

A good explanation can be found there:

http://preshing.com/20110504/hash-collision-probabilities/

If you want to generate unique values, have a look at using UDF (reflect("java.util.UUID", "randomUUID"))

3 REPLIES 3

Contributor

Created 05-03-2017 07:53 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

It is not supposed to generate unique values. The hash() function is working with ranges. It is supposed to index different ranges with integer values. Think about grouping similar ranges of values in a large data set into smaller subsets and have an index to find the respective subset.

A good explanation can be found there:

http://preshing.com/20110504/hash-collision-probabilities/

If you want to generate unique values, have a look at using UDF (reflect("java.util.UUID", "randomUUID"))

Contributor

Created 05-09-2017 07:35 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

@Amol Kulkarni - does that answer your question? Solved?

Guru

Created 05-09-2017 01:17 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Yes that's a known one. Hash function in hive functions similar to hash algorithm or hash sort logic in data structures.

Like modulo of odd number by 2 is always 1. The same way the two values provided by you results in having same hash value. In order to generate unique values make use of md5() function in hive to generate unique values. However I suggest not to this logic for generation of primary key for a table as the values out of md5() will be a total mess.

Coming from Hortonworks? Activate your account here