Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive errors with orc format and booleans

Hive errors with orc format and booleans

New Contributor

Hive is returning inconsistent results with a simple select statement on an ORC format table. This is on HDP 2.5 which I think uses Hive 1.2.

Specific problem is that if b1 is a boolean column and you use a where clause like "where b1=1", it returns results different from "where b1=true". With "where b1=1" it looks like Hive is casting 1 as boolean value true, but it doesn't return all the expected rows.

It might be possible to just use true/false instead of 0/1 everywhere and not get any problems, but someone could easily forget about that and get incorrect query results. Also I don't know if there is some underlying issue that could create other problems here.

You can reproduce the problem with (make sure hive.execution.engine is set to tez):

create table scratch.test_booleans
(n string,
b1 boolean)
stored as orc;

insert into scratch.test_booleans
select 'a' as n,true as b1
union all
select 'b' as n,true as b1
union all
select 'c' as n,false as b1;

insert into scratch.test_booleans
select 'd' as n,true as b1
union
select 'e' as n,false as b1;

--returns expected 5 rows
select * from scratch.test_booleans;

--returns expected 3 rows
select * from scratch.test_booleans where b1=true;

--returns only 1 row??? (just the row from the 2nd insert)
select * from scratch.test_booleans where b1=1;

Some additional notes:

- Problem can be fixed by any of the following: use text instead of orc format, use hive.execution.engine=mr instead of tez when running the insert, use "union" instead of "union all" everywhere

- Setting hive.execution.engine from tez to mr does not help if you do it just when running the select statements

Is this related to some known issue? I have seen mentions of TEZ having trouble with union all. We have a large Hive table that is showing this issue and I'm worried that the orc files are somehow corrupt and should be re-created.

Don't have an account?
Coming from Hortonworks? Activate your account here