Created 08-20-2018 12:57 PM
--- Hive Queries ---- Please let me know why below merge command throwing the error message?
CREATE DATABASE merge_data; CREATE TABLE merge_data.transactions( ID int, TranValue string, last_update_user string) PARTITIONED BY (tran_date string) CLUSTERED BY (ID) into 5 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');
INSERT INTO merge_data.transactions PARTITION (tran_date) VALUES (1, 'value_01', 'creation', '20170410'), (2, 'value_02', 'creation', '20170410'), (3, 'value_03', 'creation', '20170410'), (4, 'value_04', 'creation', '20170410'), (5, 'value_05', 'creation', '20170413'), (6, 'value_06', 'creation', '20170413'), (7, 'value_07', 'creation', '20170413'), (8, 'value_08', 'creation', '20170413'), (9, 'value_09', 'creation', '20170413'), (10, 'value_10','creation', '20170413');
CREATE TABLE merge_data.merge_source( ID int, TranValue string, tran_date string) STORED AS ORC;
INSERT INTO merge_data.merge_source VALUES (1, 'value_01', '20170410'), (4, NULL, '20170410'), (7, 'value_77777', '20170413'), (8, NULL, '20170413'), (8, 'value_08', '20170415'), (11, 'value_11', '20170415');
MERGE INTO merge_data.transactions AS T USING merge_data.merge_source AS S ON T.ID = S.ID WHEN MATCHED THEN UPDATE SET TranValue = S.TranValue, last_update_user = 'merge_update' WHEN NOT MATCHED THEN INSERT VALUES (S.ID, S.TranValue, 'merge_insert', S.tran_date);
-- Error --
Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cardinality_violation(_col0,_col1)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cardinality_violation(_col0,_col1)
Caused by: java.lang.RuntimeException: Cardinality Violation in Merge statement: [61, 3, 0],20170415
Created 08-20-2018 02:09 PM
The issue is because of duplicate rows.
When 'not match' is combined with 'match' under the Merge statement, then the cardinality check is applied by default.
The Cardinality check needs to be disabled when using both 'matched' and 'not matched'.
Set the following property in your hive shell and then try to execute the merge statement again.
Set hive.merge.cardinality.check=false;
Refer to this support KB article for more details regards to the same exact issue..!!
-
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created 08-20-2018 04:15 PM
hive.merge.cardinality.check=false is a bad idea. The logic controlled by this property checks if the ON clause of your Merge statement is such that more than 1 row from source side matches the same row from target side (which only happens in WHEN MATCHED clause). Logically what this means is that the query is asking the system to update 1 existing row in target in 2 (or more) different ways. This check is actually part of SQL standard definition of how Merge should work. You either need examine your data or the ON clause but disabling this check, when it throws a cardinality_violation error, may lead to data corruption later.