Created on 09-04-2024 06:14 PM - edited 09-04-2024 06:23 PM
my cluster is cdh5.14 impala 2.11 kudu 1.6.0
impala on kudu,When union all is used to merge more than three select join queries, the third and later select must perform full table scanning. How can I solve the problem of full table scanning?
A full table scan is not performed when the select join statement is executed alone.
sql as follows:
SELECT
mb.unitno,
mb.materialcode,
mb.starttime,
mb.length,
ib.equipmentcode,
ib.defectclass,
ib.defectname,
ib.side
FROM
sc1.tba1 mb
LEFT OUTER JOIN
sc1.tbb1 ib
ON
mb.materialcode = ib.materialcode
where mb.starttime>'20240827'and mb.starttime<'20240828' )
UNION ALL
SELECT
md.unitno,
md.materialcode,
md.starttime,
md.length,
id.equipmentcode,
id.defectclass,
id.defectname,
id.side
FROM
sc1.tba2 md
LEFT OUTER JOIN
sc1.tbb2 id
ON
md.materialcode = id.materialcode
where md.starttime>'20240827'and md.starttime<'20240828' )
UNION ALL
SELECT
me.unitno,
me.materialcode,
me.starttime,
me.length,
ie.equipmentcode,
ie.defectclass,
ie.defectname,
ie.side
FROM
sc1.tba3 me
LEFT OUTER JOIN
sc1.tbb3 ie
ON
me.materialcode = ie.materialcode
where me.starttime>'20240827'and me.starttime<'20240828' );
The number of tbb1, tbb2, and tbb3 rows is 800 million,The number of tba1, tba2, and tbba3 rows is 10000. When executing the above sql, a full table scan is performed on sc1.tbb3.
Created 10-15-2024 08:33 AM
Hi @mrblack , how do you know that Impala performs a full table scan?
Created 11-21-2024 10:40 PM
Hi @mrblack
To avoid full table scan you follow these tips:
1. Ensure proper partition pruning:
2. Re write the query with sub queries.
3. Add explicit hints for join behaviour. Impala supports join hints like brodcast and shuffle that can influence query planning.
After optimising check the explain plan.
Regards,
Chethan YM