Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

reduceByKey(_ ++ _)

avatar
New Contributor

rdd.map(kv => (kv._1, new Set[String]() + kv._2)) .reduceByKey(_ ++ _)

 

In the above code, what is (kv._1, new Set[String]() + kv._2)) and reduceByKey(_ ++ _).

 

I know reduceByKey(_+_), but not (_++_).. Please let me know if someone knows this..

 

Thanks!

1 ACCEPTED SOLUTION

avatar
Master Collaborator

The first operation makes each value into a set containing that single value. ++ just adds collections together, combining elements of both sets. This is trying to build up a set of all values for each key. It can be written more simply as "groupByKey" really. Even this code could be more compact and efficient.

View solution in original post

1 REPLY 1

avatar
Master Collaborator

The first operation makes each value into a set containing that single value. ++ just adds collections together, combining elements of both sets. This is trying to build up a set of all values for each key. It can be written more simply as "groupByKey" really. Even this code could be more compact and efficient.