Posts: 18
Registered: ‎04-18-2016

Sub Partitioning an RDD in pyspark

In pyspark using mapPartitionsWithIndex, I can iterate over each partition of rdd and apply action, but I do not want to increase number of partitions beyond 10, instead want to use nested partitioning. x:(x,1)).reduceByKey(lambda a,b:a+b)
for part_id in range(rdd.getNumPartitions()):
    part_rdd = rdd.mapPartitionsWithIndex(generator_func(part_id))
    # partition part_rdd further to 10 partitons:

I am not getting what to do in partition_func(), I thought of using lambda x:x but it didn't worked. Any suggestions what should i do here?