Created 12-29-2016 06:16 PM
In Hive, I want to compare the data between two tables and i want to generate a checksum for each column and then compare each checksum for each column.
Appreciate if you could let me know if there is any checksum function in hive.
Created 12-29-2016 06:41 PM
You can use md5 on each column
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
string | md5(string/binary) | Calculates an MD5 128-bit checksum for the string or binary (as of Hive 1.3.0). The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. Example: md5('ABC') = '902fbdd2b1df0c4f70b4a5d23525e932'. |
Created 12-29-2016 06:51 PM
Thanks and when i am trying to give command in hive. I am getting SemanticException [Error 10011]: Line 1:7 Invalid function 'md5/'
select md5(location) as md_location from testing;
Basically i am trying to get the checksum on the location column.
Created 12-29-2016 07:06 PM
Can you share the complete output ? works just fine for me:
0: jdbc:hive2://localhost:10000/default> select md5(col1) as md_location from test1 limit 5; +-----------------------------------+--+ | md_location | +-----------------------------------+--+ | c4ca4238a0b923820dcc509a6f75849b | | c81e728d9d4c2f636f067f89cc14862c | | eccbc87e4b5ce2fe28308fd9f2a7baf3 | | a87ff679a2f3e71d9181a67b7542122c | | e4da3b7fbbce2345d7772b0674a318d5 | +-----------------------------------+--+ 5 rows selected (0.115 seconds)
Which version of HDP are you using ?
Created 12-29-2016 07:40 PM
Attached is my screen shot...
Created 12-29-2016 08:57 PM
Thanks for the capture @Praveen PentaReddy, in one of the related posts here
https://community.hortonworks.com/questions/68012/hive-cant-get-the-md5-value.html
It seems like apparently the version of hive which is available below HDP 2.5, might not have that available. That is the same sandbox I used to perform the test at my end. If you don't have that option, you can still write your own custom UDF to perform the same outcome.
Have a look at this simple code which you can use for your custom UDF
Created 12-29-2016 11:29 PM
OK .. thanks for the update.
you mean to say that the hive version which i am using is not supported to the i am looking for ?
Created 01-03-2017 07:33 PM
Your select statement looks right, if the statement is correct then it must be the version which is missing the function md5.
Created 12-29-2016 07:22 PM
capture.pngI am using Hadoop 2.7X version and hive i am using 1.2 x version and below is the screen shot.
Created 12-30-2016 06:22 PM
md5 is not a available function in Hive v1.2. It's available from v1.3.