Support Questions

Find answers, ask questions, and share your expertise

Query Runs slow on hive when using NOT LIKE

avatar

Hello Everyone!

 

On Hive 1.1.0 the following query executes for one hour & the table has nearly 2 billion records.

 

select

C1,

C2,

C3,

C4,

from <tablename>

where NOT ( instr(FileName,'<sometext>')> 0

or instr(FileName,'<sometext>')> 0

or instr(FileName,'<sometext>')>0

or FileName='' )

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

and FILENAME not like '%<sometext>%'

..........;

 

 

Kindly suggest some better ways to optimise the above query.

 

Thanks In Advance,

Janani

 

 

1 ACCEPTED SOLUTION

avatar
Rising Star

@JananiViswa1

 

From the first galance of this problem, I can see that you have a lot of "like" operators in your query. An like operator incurs regular expression matching, which is very costive, and may cause slowness to the query.

 

Have you noticed where the slowness happens? Is it within Hive itself, or is it just the MR job runs for a long time? If it is the MR job that slows everything down, please consider reducing the split size of the job and thus using more mappers to process the input data. To do this, please run below commands before the query:

set mapred.min.split.size=63000000;
set mapred.max.split.size=64000000;

If my assumption is wrong, or you still have problem after applying above change, please give me more info so that I can investigate further:

 

1) Hive log

If you use Hive CLI, please give us the command output

If you use HS2, please give us the HS2 log file or relevant information in it

 

2) mapreduce job configuration and log file

 

3) The definition of your source table (output of "show create table <tbl_name>")

View solution in original post

2 REPLIES 2

avatar
Rising Star

@JananiViswa1

 

From the first galance of this problem, I can see that you have a lot of "like" operators in your query. An like operator incurs regular expression matching, which is very costive, and may cause slowness to the query.

 

Have you noticed where the slowness happens? Is it within Hive itself, or is it just the MR job runs for a long time? If it is the MR job that slows everything down, please consider reducing the split size of the job and thus using more mappers to process the input data. To do this, please run below commands before the query:

set mapred.min.split.size=63000000;
set mapred.max.split.size=64000000;

If my assumption is wrong, or you still have problem after applying above change, please give me more info so that I can investigate further:

 

1) Hive log

If you use Hive CLI, please give us the command output

If you use HS2, please give us the HS2 log file or relevant information in it

 

2) mapreduce job configuration and log file

 

3) The definition of your source table (output of "show create table <tbl_name>")

avatar

Hi,

 

Thank You So Much ! Now the query run time is decreased by 30 mins.

 

Thanks,

Janani