Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark combineByKey usage

Spark combineByKey usage

New Contributor

Lets say have the following data which needs to be saved as "id (list of names)"

1,Daniel

2,Emma

2,Ethan

2,Isabella

2,Jacob

1,Matthew

1,Mia

2,Noah

1,Olivia

1,Sophia

 

RDD

scala> val dataRDD = data.map(x => (x.split(",")(0), x.split(",")(1)))

CombineByKey

scala> val result = 
     | dataRDD.combineByKey(
     | List(_),
     | (x:List[String], y:String) =>
     | y :: x, 
     | (x:List[String],y:List[String]) =>
     | x ::: y)

Cant figure how this syntex is converting the data to proper format?

(2,List(Noah, Jacob, Isabella, Ethan, Emma))
(1,List(Mia, Matthew, Daniel))