Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive - Do we have checksum in hive ?

Highlighted

Hive - Do we have checksum in hive ?

Rising Star

In Hive, I want to compare the data between two tables and i want to generate a checksum for each column and then compare each checksum for each column.

Appreciate if you could let me know if there is any checksum function in hive.

11 REPLIES 11

Re: Hive - Do we have checksum in hive ?

Super Guru

You can use md5 on each column

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

stringmd5(string/binary)Calculates an MD5 128-bit checksum for the string or binary (as of Hive 1.3.0). The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. Example: md5('ABC') = '902fbdd2b1df0c4f70b4a5d23525e932'.

Re: Hive - Do we have checksum in hive ?

Rising Star

Thanks and when i am trying to give command in hive. I am getting SemanticException [Error 10011]: Line 1:7 Invalid function 'md5/'

select md5(location) as md_location from testing;

Basically i am trying to get the checksum on the location column.

Re: Hive - Do we have checksum in hive ?

Guru

Can you share the complete output ? works just fine for me:

0: jdbc:hive2://localhost:10000/default> select md5(col1) as md_location from test1 limit 5;
+-----------------------------------+--+
|            md_location            |
+-----------------------------------+--+
| c4ca4238a0b923820dcc509a6f75849b  |
| c81e728d9d4c2f636f067f89cc14862c  |
| eccbc87e4b5ce2fe28308fd9f2a7baf3  |
| a87ff679a2f3e71d9181a67b7542122c  |
| e4da3b7fbbce2345d7772b0674a318d5  |
+-----------------------------------+--+
5 rows selected (0.115 seconds)



Which version of HDP are you using ?

Re: Hive - Do we have checksum in hive ?

Rising Star

Attached is my screen shot...

capture.png

Re: Hive - Do we have checksum in hive ?

Guru

Thanks for the capture @Praveen PentaReddy, in one of the related posts here

https://community.hortonworks.com/questions/68012/hive-cant-get-the-md5-value.html

It seems like apparently the version of hive which is available below HDP 2.5, might not have that available. That is the same sandbox I used to perform the test at my end. If you don't have that option, you can still write your own custom UDF to perform the same outcome.

Have a look at this simple code which you can use for your custom UDF

https://gist.github.com/dataminelab/1050002

Re: Hive - Do we have checksum in hive ?

Rising Star

OK .. thanks for the update.

you mean to say that the hive version which i am using is not supported to the i am looking for ?

Re: Hive - Do we have checksum in hive ?

Guru

Your select statement looks right, if the statement is correct then it must be the version which is missing the function md5.

Re: Hive - Do we have checksum in hive ?

Rising Star

capture.pngI am using Hadoop 2.7X version and hive i am using 1.2 x version and below is the screen shot.

Re: Hive - Do we have checksum in hive ?

New Contributor

md5 is not a available function in Hive v1.2. It's available from v1.3.

Don't have an account?
Coming from Hortonworks? Activate your account here