Apart from specifying the no of partitions when creating a DF or using coalesce/re-partitions, Is there any parameter where we can change the configurations or parameter so default RDD partitions(200) can be reduced.
@Dinesh Chitlangia could you help me with this.
Spark provides multiple API to achieve repartition.
If you want to decrease the number of partition in the first RDD.
1. Try writing a CombinedInputFileFormat() which will ensure less partitions(tasks)
* coalesce is more efficient than repartitions as it ensures less shuffling of data.