Reply
Explorer
Posts: 25
Registered: ‎01-10-2017

How to speed up RDD.count()

We have streaming application which has count action

tempRequestsWithState is a DStream

tempRequestsWithState.foreachRDD { rdd =>

    print (rdd.count())

}

The count action is taking a lot of time and slow taking about 30 mins Would greatly appreciate if anyone could suggest a way to speedup this action as we are consuming @ 10,000 events/sec Also noticed we have 54 partitions for each RDD

enter image description here

enter image description here

Highlighted
Announcements