Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Skew issue in spark SQL

Highlighted

Skew issue in spark SQL

New Contributor

Hi,

I am joining two tables. One table is skewed. How to handle this in spark SQL. I am using spark 2.2.1 in AWS EMR.

Please assist on this.

1 REPLY 1

Re: Skew issue in spark SQL

@elango vaithiyanathan

Perhaps you could pick another way to partition your data, by different column where the distribution of data is split evenly (hopefully)

Or else you could build an artificial (numeric) column by salting, and partition by this column.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Don't have an account?
Coming from Hortonworks? Activate your account here