Reply
New Contributor
Posts: 1
Registered: ‎08-11-2017

Spark combineByKey usage

Lets say have the following data which needs to be saved as "id (list of names)"

1,Daniel

2,Emma

2,Ethan

2,Isabella

2,Jacob

1,Matthew

1,Mia

2,Noah

1,Olivia

1,Sophia

 

RDD

scala> val dataRDD = data.map(x => (x.split(",")(0), x.split(",")(1)))

CombineByKey

scala> val result = 
     | dataRDD.combineByKey(
     | List(_),
     | (x:List[String], y:String) =>
     | y :: x, 
     | (x:List[String],y:List[String]) =>
     | x ::: y)

Cant figure how this syntex is converting the data to proper format?

(2,List(Noah, Jacob, Isabella, Ethan, Emma))
(1,List(Mia, Matthew, Daniel))

 

 

Announcements