Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Impala Queries slow using GROUP BY and LIKE

avatar
New Contributor

Hi,

 

We are testing Apache Impala and have noticed that using GROUP BY and LIKE together works very slowly -- separate queries work much faster. Here are two examples:

 

SELECT * FROM hive.default.pcopy1B where (lower("by") like '%part%' and lower("by") like '%and%' and lower("by") like '%the%') or (lower(title) like '%part%' and lower(title) like '%and%' and lower(title) like '%the%') or (lower(url) like '%part%' and lower(url) like '%and%' and lower(url) like '%the%') or (lower(text) like '%part%' and lower(text) like '%and%' and lower(text) like '%the%') limit 100;


1.37s 1.08s 1.35s

select "by", type, ranking, count(*) from pcopy where (lower("by") like '%part%' and lower("by") like '%and%' and lower("by") like '%the%') or (lower(title) like '%part%' and lower(title) like '%and%' and lower(title) like '%the%') or (lower(url) like '%part%' and lower(url) like '%and%' and lower(url) like '%the%') or (lower(text) like '%part%' and lower(text) like '%and%' and lower(text) like '%the%') group by "by", type, ranking order by 4 desc limit 10;
156.64s 155.63s

 

Could someone please explain why this issue occurs, and if there are any workarounds?

 

Thanks,

 

David

Who agreed with this topic