The essential point here is that you want to avoid a shuffle, and you can avoid a shuffle if both RDDs are partitioned in the same way, because then all values for the same key are already on 1 partition in each RDD. join calls cogroup so yes both can accomplish this, as long as both RDDs have the same partitioner. This won't be true, however, if you first flatMap one of the RDDs which can't be known to retain the partitioning.