I am joining two tables. One table is skewed. How to handle this in spark SQL. I am using spark 2.2.1 in AWS EMR.
Please assist on this.
Perhaps you could pick another way to partition your data, by different column where the distribution of data is split evenly (hopefully)
Or else you could build an artificial (numeric) column by salting, and partition by this column.
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.