Support Questions

Find answers, ask questions, and share your expertise

Hive - Do we have checksum in hive ?

avatar
Expert Contributor

In Hive, I want to compare the data between two tables and i want to generate a checksum for each column and then compare each checksum for each column.

Appreciate if you could let me know if there is any checksum function in hive.

11 REPLIES 11

avatar
Guru

Thanks @Keith Mascarenhas, that is good to know.

avatar
Super Collaborator

@Praveen PentaReddy,

As @Timothy Spann mentioned above - hive MD5 should work. But since you don't have it as built-in in your version of Hive, you still have few more option to choose from:

  1. Use built-in "hash" function (32 bits, since it is INT)
  2. Create your custom UDF and implement checksum as you would do in regular Java
  3. Use ReflectUDF:
SELECT reflect('org.apache.commons.codec.digest.DigestUtils', '<your_method>', 'your_string')

where <your_method> can be: md5Hex, sha1Hex, etc...

Consider the trade-off on coding/dev investments and performance.