Reply
Highlighted
Explorer
Posts: 12
Registered: ‎07-26-2017

Can't get correct result by count function.

Hi , 

 

I can't get correct result by count function in HIVE.

 

1,Table Definition

CREATE TABLE table_test(
user_id string,
title_id string,
os string,
read_type string)
PARTITIONED BY (
dt string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'line.delim'='\n',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

 

2,Data 

select title_id from cc_lcs_user_detail_d where  dt='20180516' and user_id='1001162601' limit 10;
title_id 
na 
na 
 
3,SQL1 (no problem)
select count(distinct title_id) from cc_lcs_user_detail_d where dt='20180516' and user_id='1001162601' limit 10;
_c0 
 
4,SQL2(issue)
select user_id , count(distinct title_id)  from cc_lcs_user_detail_d where dt='20180516' group by user_id limit 10;
 
user_id _c1 _c2 _c3
XXXX......
1001162601 0 
XXXX......
 
I can't understand why i get 0 by the sql4. 
Do you know Why?
 
By the way , The count of table is 28909917 rows in partition:20180516.
 
 

 

Cloudera Employee
Posts: 312
Registered: ‎03-23-2015

Re: Can't get correct result by count function.

What version of CDH or Hive are you using?

Is it only impacting that particular user_id?

Is it easily re-producible by creating a new table?

It looks like a bug to me.
Announcements