Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Local join per node instead of data exchange - co-located blocks


Local join per node instead of data exchange - co-located blocks


In order to prevent network traffic, I'd like to perform local joins on each node instead of exchanging the data and perform a join over the complete data afterwards. My query is basically a join over three three tables on an ID attribute. The blocks are perfectly distributed, so that e.g. Table A - Block 0  and Table B - Block 0  are on the same node. These blocks contain all data rows with an ID range [0,1]. Table A - Block 1  and Table B - Block 1 with an ID range [2,3] are on another node etc. So I want to perform a local join per node because any data exchange would be unneccessary (except for the last step when the final node recevieves all results of the other nodes). 

At the moment the query plan looks like follows, although the blocks are already perfectly distributed.



I would like to skip the exchange steps and perform the joins locally. Is this possible? 

Don't have an account?
Coming from Hortonworks? Activate your account here