Support Questions

Find answers, ask questions, and share your expertise

Hive - Do we have checksum in hive ?

avatar
Expert Contributor

In Hive, I want to compare the data between two tables and i want to generate a checksum for each column and then compare each checksum for each column.

Appreciate if you could let me know if there is any checksum function in hive.

11 REPLIES 11

avatar
Master Guru

You can use md5 on each column

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

stringmd5(string/binary)Calculates an MD5 128-bit checksum for the string or binary (as of Hive 1.3.0). The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. Example: md5('ABC') = '902fbdd2b1df0c4f70b4a5d23525e932'.

avatar
Expert Contributor

Thanks and when i am trying to give command in hive. I am getting SemanticException [Error 10011]: Line 1:7 Invalid function 'md5/'

select md5(location) as md_location from testing;

Basically i am trying to get the checksum on the location column.

avatar
Guru

Can you share the complete output ? works just fine for me:

0: jdbc:hive2://localhost:10000/default> select md5(col1) as md_location from test1 limit 5;
+-----------------------------------+--+
|            md_location            |
+-----------------------------------+--+
| c4ca4238a0b923820dcc509a6f75849b  |
| c81e728d9d4c2f636f067f89cc14862c  |
| eccbc87e4b5ce2fe28308fd9f2a7baf3  |
| a87ff679a2f3e71d9181a67b7542122c  |
| e4da3b7fbbce2345d7772b0674a318d5  |
+-----------------------------------+--+
5 rows selected (0.115 seconds)



Which version of HDP are you using ?

avatar
Expert Contributor

Attached is my screen shot...

capture.png

avatar
Guru

Thanks for the capture @Praveen PentaReddy, in one of the related posts here

https://community.hortonworks.com/questions/68012/hive-cant-get-the-md5-value.html

It seems like apparently the version of hive which is available below HDP 2.5, might not have that available. That is the same sandbox I used to perform the test at my end. If you don't have that option, you can still write your own custom UDF to perform the same outcome.

Have a look at this simple code which you can use for your custom UDF

https://gist.github.com/dataminelab/1050002

avatar
Expert Contributor

OK .. thanks for the update.

you mean to say that the hive version which i am using is not supported to the i am looking for ?

avatar
Guru

Your select statement looks right, if the statement is correct then it must be the version which is missing the function md5.

avatar
Expert Contributor

capture.pngI am using Hadoop 2.7X version and hive i am using 1.2 x version and below is the screen shot.

avatar
New Contributor

md5 is not a available function in Hive v1.2. It's available from v1.3.