Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive cost-based-optimizer behaviour change for null checks against complex types

Highlighted

Hive cost-based-optimizer behaviour change for null checks against complex types

New Contributor

Hi there,

 

We are in the process of migrating to HDP 3.1.4, from an older HDP stack.

We have noticed strange behaviour regarding the CBO and how it treats null checks against complex types.

 

consider the following:


CREATE TABLE test_null_array (null_array ARRAY<STRING>);
INSERT OVERWRITE TABLE test_null_array SELECT EXPLODE(ARRAY(NULL, ARRAY('foobar')));
SELECT * FROM test_null_array; -- returns two rows (NULL, ["foobar"])

SET hive.cbo.enable=false;
SELECT * FROM test_null_array WHERE null_array IS NULL; -- returns one row (NULL) as expected
EXPLAIN SELECT * FROM test_null_array WHERE null_array IS NULL; -- line 9: predicate:null_array is null

SET hive.cbo.enable=true;
SELECT * FROM test_null_array WHERE null_array IS NULL; -- returns zero rows *unexpected*
EXPLAIN SELECT * FROM test_null_array WHERE null_array IS NULL; -- line 9: predicate:false

 We have a suspicion that this is new behaviour for HIVE 3. Has anyone else noticed this behaviour, can direct me to documentation about this?

 

Thank you!

1 REPLY 1

Re: Hive cost-based-optimizer behaviour change for null checks against complex types

New Contributor

Update:

I've managed to find a related issue when selecting distinct with the same clause:

 

select distinct null_array from test_null_array where null_array is null;
FAILED: NullPointerException null

 

 Any thoughts?

Don't have an account?
Coming from Hortonworks? Activate your account here