Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can't get correct result by count function.

Can't get correct result by count function.

Explorer

Hi , 

 

I can't get correct result by count function in HIVE.

 

1,Table Definition

CREATE TABLE table_test(
user_id string,
title_id string,
os string,
read_type string)
PARTITIONED BY (
dt string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'line.delim'='\n',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

 

2,Data 

select title_id from cc_lcs_user_detail_d where  dt='20180516' and user_id='1001162601' limit 10;
title_id 
na 
na 
 
3,SQL1 (no problem)
select count(distinct title_id) from cc_lcs_user_detail_d where dt='20180516' and user_id='1001162601' limit 10;
_c0 
 
4,SQL2(issue)
select user_id , count(distinct title_id)  from cc_lcs_user_detail_d where dt='20180516' group by user_id limit 10;
 
user_id _c1 _c2 _c3
XXXX......
1001162601 0 
XXXX......
 
I can't understand why i get 0 by the sql4. 
Do you know Why?
 
By the way , The count of table is 28909917 rows in partition:20180516.
 
 

 

1 REPLY 1
Highlighted

Re: Can't get correct result by count function.

Guru
What version of CDH or Hive are you using?

Is it only impacting that particular user_id?

Is it easily re-producible by creating a new table?

It looks like a bug to me.