Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Join intermediate results contain more elements than left or right side

Highlighted

Join intermediate results contain more elements than left or right side

New Contributor

I have 2 tables which are joined. All entries in both tables are unique. Now in the query planner and exec summary it shows more records coming out of the hash join operator than both input sides combined.

 

Left table:                  589,740 records
Right table:                 116,776 records
Intermediate join result: 92,291,634 records

 

query-plan.png

 

There is a group by afterwards which brings the result back down to the left table number of records: Eventual end result: 589,740


My assumption was that the join result can be only the number of records of either the left or the right side (left side in this case as it is a left join) but not more without having duplicate records.

 

Is there anything that could cause this?

 

1 REPLY 1

Re: Join intermediate results contain more elements than left or right side

Rising Star
Hi Fenton,

Thanks for the information,

Join condition could create cross product which result in the multiplication of records(As shown in the screen shot).

Could you please share the query plan and join condition to check further.

explain <query>

Thanks
Jerry