Member since
01-16-2018
4
Posts
0
Kudos Received
0
Solutions
04-20-2018
08:30 PM
Could be a data skew issue. Checkout if any partition has huge chunk of the data compared to the rest. https://github.com/adnanalvee/spark-assist/blob/master/spark-assist.scala From the link above, copy the function "partitionStats" and pass in your data as a dataframe. It will show the maximum, minimum and average amount of data across your partitions like below. +------+-----+------------------+
|MAX |MIN |AVERAGE |
+------+-----+------------------+
|135695|87694|100338.61149653122|
+------+-----+------------------+
... View more