Support Questions
Find answers, ask questions, and share your expertise

How to change the default partition created for RDD in spark

Apart from specifying the no of partitions when creating a DF or using coalesce/re-partitions, Is there any parameter where we can change the configurations or parameter so default RDD partitions(200) can be reduced.

@Dinesh Chitlangia could you help me with this.

1 REPLY 1

Re: How to change the default partition created for RDD in spark

Spark provides multiple API to achieve repartition.
1. RDD.groupBy(num_partition)
2. RDD.repartition(num_partition)
3. RDD.coalesce(num_partition)

If you want to decrease the number of partition in the first RDD.
1. Try writing a CombinedInputFileFormat() which will ensure less partitions(tasks)

* coalesce is more efficient than repartitions as it ensures less shuffling of data.