Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SELECT count(myCol) FROM myTable gives wrong result when ORC partitionned table, TEZ engine and vectorization enabled.,hive tez count(col1) returns wrong value

Highlighted

SELECT count(myCol) FROM myTable gives wrong result when ORC partitionned table, TEZ engine and vectorization enabled.,hive tez count(col1) returns wrong value

New Contributor

1-

On hdp-2.3.0.0-25 (hive 1.2.1.2.3, tez 0.7.0.2.3), open an hive or beeline session with following settings :

SET hive.execution.engine=tez;

SET hive.vectorized.execution.enabled=true;

SET hive.vectorized.execution.reduce.enabled=true;

2-

Create an ORC table :

CREATE TABLE IF NOT EXISTS `mytable`(`col1` timestamp,`id` int,`type` int,`id2` int,`fun_id` int,`priority` int)

PARTITIONED BY (`part_id1` int,`part_id2` bigint)

STORED AS ORC;

3-

Insert just one or more rows into this table as given :

INSERT INTO TABLE `mytable` PARTITION (part_id1,part_id2)

VALUES("2016-06-23 14:18:04.0",34,1,21,313928,5,1,20160126000000);

4-

Query Hive to get col1 number in the that table :

SELECT count(fun_id) FROM `mytable`;

5- The query returns 313928 which is wrong ! I was expecting 1 !

6- All suggestions are welcome !

8740-capture.png

,

0- On hdp-2.3.0.0-2557 (hive 1.2.1.2.3, tez 0.7.0.2.3), open an hive or beeline session with following settings :

SET hive.execution.engine=tez;

SET hive.vectorized.execution.enabled=true;

SET hive.vectorized.execution.reduce.enabled=true;

1- Create an ORC table

CREATE TABLE IF NOT EXISTS `adb.mytable`(`col1` timestamp,`id` int,`type` int,`id2` int,`fun_id` int,`priority` nt) PARTITIONED BY (`part_id1` int,`part_id2` bigint) STORED AS ORC;

2- Insert just one row into that table

INSERT INTO TABLE adb.mytable PARTITION (part_id1,part_id2)

VALUES("2016-06-23 14:18:04.0",34,1,21,313928,5,1,20160126000000);

3- Launch following query :

SELECT count(fun_id) FROM `adb.mytable`;

Result will be 313928 which is wrong !!!

I need your help.

2 REPLIES 2

Re: SELECT count(myCol) FROM myTable gives wrong result when ORC partitionned table, TEZ engine and vectorization enabled.,hive tez count(col1) returns wrong value

Rising Star

@Adama DIABATE

I ran the same tests and the results were as expected. Can you paste the output of

select * from mytable

and

select fun_id from mytable
Highlighted

Re: SELECT count(myCol) FROM myTable gives wrong result when ORC partitionned table, TEZ engine and vectorization enabled.,hive tez count(col1) returns wrong value

New Contributor

@Hari Rongali,

Thanks you for replying.

Here in the screenshot the results for more investigationcapture2.png

Don't have an account?
Coming from Hortonworks? Activate your account here