Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to change the default partition created for RDD in spark

Highlighted

How to change the default partition created for RDD in spark

Apart from specifying the no of partitions when creating a DF or using coalesce/re-partitions, Is there any parameter where we can change the configurations or parameter so default RDD partitions(200) can be reduced.

@Dinesh Chitlangia could you help me with this.

1 REPLY 1

Re: How to change the default partition created for RDD in spark

Spark provides multiple API to achieve repartition.
1. RDD.groupBy(num_partition)
2. RDD.repartition(num_partition)
3. RDD.coalesce(num_partition)

If you want to decrease the number of partition in the first RDD.
1. Try writing a CombinedInputFileFormat() which will ensure less partitions(tasks)

* coalesce is more efficient than repartitions as it ensures less shuffling of data.

Don't have an account?
Coming from Hortonworks? Activate your account here