Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

reduceByKey(_ ++ _)

avatar
Frequent Visitor

rdd.map(kv => (kv._1, new Set[String]() + kv._2)) .reduceByKey(_ ++ _)

 

In the above code, what is (kv._1, new Set[String]() + kv._2)) and reduceByKey(_ ++ _).

 

I know reduceByKey(_+_), but not (_++_).. Please let me know if someone knows this..

 

Thanks!

1 ACCEPTED SOLUTION

avatar
Master Collaborator

The first operation makes each value into a set containing that single value. ++ just adds collections together, combining elements of both sets. This is trying to build up a set of all values for each key. It can be written more simply as "groupByKey" really. Even this code could be more compact and efficient.

View solution in original post

1 REPLY 1

avatar
Master Collaborator

The first operation makes each value into a set containing that single value. ++ just adds collections together, combining elements of both sets. This is trying to build up a set of all values for each key. It can be written more simply as "groupByKey" really. Even this code could be more compact and efficient.