I am trying to join 2 tables and the query that I am using is similar to the below
select * from table t1 join table t2 on
t1.name = t2.name where t1.starttime > t2.starttime and t2.endtime<t2.endtime
The query gets heavily skewed up and takes the longest time in the last reducer and @99% runs for about an hours and fails. If I execute it for single day I get the output. I can send the log file if needed
Hi @Sajitha S Based on the question which you have described I understand for the two table on which you are performing join its getting skewed on the join fields.
However try setting this parameters in hive and execute your hive query.
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
set hive.vectorized.execution.reduce.groupby.enabled = true;
ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS;
ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS colnames;
Also if possible share the logs as well.