rdd.map(kv => (kv._1, new Set[String]() + kv._2)) .reduceByKey(_ ++ _)
In the above code, what is (kv._1, new Set[String]() + kv._2)) and reduceByKey(_ ++ _).
I know reduceByKey(_+_), but not (_++_).. Please let me know if someone knows this..
Thanks!