Reply
Highlighted
Contributor
Posts: 78
Registered: ‎06-19-2014

the "where in" statement miss data when containing unicode characters

[ Edited ]

hi guys,

 

The same sql with different order of value in the "in statement" ,get different result.

sql1:

select count(1) from
(
select 
id, type 
from temp
where s_id=588 AND j_id=224 AND 
id in ('axiaoyuyuan05210816','chengjing08gk','lijhibj','dilys0706','gaobin13897925284','tb252434778','zexin5311520','刘修培201102','小小兔子1','曹刚剑芹','相濡以沫19890111','我爱大屎粑粑','邓存春1161577805','lichun踏天','zqy新人王者','小扎西呀','怪带劲滴','王冲山','笑影子越','孙立伟0','尹靖心2007','昌红映5078','胡丁幻丶','15838132962唯一','lily雨栀','zhubiao8889','心中丶无爱','最强地球人','梅子潇湘43087','陈嘉颖1984','中国我爱你赵亮','公元1989年','博博kissbobo','塑造de灵魂','欧春春11','灰鑫亚泉化工','谢勇123444','郎个里个郎嘿')
) a;

35
Fetched 1 row(s) in 0.98s

sql2:

select count(1) from
(
select 
id, type
from temp 
where s_id=588 AND j_id=224 AND 
id in ('zhubiao8889','axiaoyuyuan05210816','chengjing08gk','lijhibj','dilys0706','gaobin13897925284','tb252434778','zexin5311520','刘修培201102','小小兔子1','曹刚剑芹','相濡以沫19890111','我爱大屎粑粑','邓存春1161577805','lichun踏天','zqy新人王者','小扎西呀','怪带劲滴','王冲山','笑影子越','孙立伟0','尹靖心2007','昌红映5078','胡丁幻丶','15838132962唯一','lily雨栀','心中丶无爱','最强地球人','梅子潇湘43087','陈嘉颖1984','中国我爱你赵亮','公元1989年','博博kissbobo','塑造de灵魂','欧春春11','灰鑫亚泉化工','谢勇123444','郎个里个郎嘿')
) a;

38
Fetched 1 row(s) in 0.98s

The sql2 got the right result.

 

Is there a solution?

 

I created a issue here:https://issues.cloudera.org/browse/IMPALA-3921

 

rube.q

Contributor
Posts: 78
Registered: ‎06-19-2014

Re: the "where in" statement miss data when containing unicode characters

Can anybody help me?
Cloudera Employee
Posts: 422
Registered: ‎07-29-2015

Re: the "where in" statement miss data when containing unicode characters

Hi, we haven't had time to look at the JIRA you filed. It looks like you provided data (thank you) so we'll look at reproducing it. I increased the priority since I think we should figure out what's going on.

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: the "where in" statement miss data when containing unicode characters

Although I agree that we can probaly make this specific case work, please be aware that Impala does not fully support UTF-8 yet.