Created on 07-14-2016 03:21 AM - edited 09-16-2022 03:29 AM
Hello Everyone!
On Hive 1.1.0 the following query executes for one hour & the table has nearly 2 billion records.
select
C1,
C2,
C3,
C4,
from <tablename>
where NOT ( instr(FileName,'<sometext>')> 0
or instr(FileName,'<sometext>')> 0
or instr(FileName,'<sometext>')>0
or FileName='' )
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
and FILENAME not like '%<sometext>%'
..........;
Kindly suggest some better ways to optimise the above query.
Thanks In Advance,
Janani
Created 07-21-2016 06:05 PM
From the first galance of this problem, I can see that you have a lot of "like" operators in your query. An like operator incurs regular expression matching, which is very costive, and may cause slowness to the query.
Have you noticed where the slowness happens? Is it within Hive itself, or is it just the MR job runs for a long time? If it is the MR job that slows everything down, please consider reducing the split size of the job and thus using more mappers to process the input data. To do this, please run below commands before the query:
set mapred.min.split.size=63000000; set mapred.max.split.size=64000000;
If my assumption is wrong, or you still have problem after applying above change, please give me more info so that I can investigate further:
1) Hive log
If you use Hive CLI, please give us the command output
If you use HS2, please give us the HS2 log file or relevant information in it
2) mapreduce job configuration and log file
3) The definition of your source table (output of "show create table <tbl_name>")
Created 07-21-2016 06:05 PM
From the first galance of this problem, I can see that you have a lot of "like" operators in your query. An like operator incurs regular expression matching, which is very costive, and may cause slowness to the query.
Have you noticed where the slowness happens? Is it within Hive itself, or is it just the MR job runs for a long time? If it is the MR job that slows everything down, please consider reducing the split size of the job and thus using more mappers to process the input data. To do this, please run below commands before the query:
set mapred.min.split.size=63000000; set mapred.max.split.size=64000000;
If my assumption is wrong, or you still have problem after applying above change, please give me more info so that I can investigate further:
1) Hive log
If you use Hive CLI, please give us the command output
If you use HS2, please give us the HS2 log file or relevant information in it
2) mapreduce job configuration and log file
3) The definition of your source table (output of "show create table <tbl_name>")
Created 07-22-2016 02:18 AM
Hi,
Thank You So Much ! Now the query run time is decreased by 30 mins.
Thanks,
Janani