Created on 11-30-2015 02:39 AM - edited 11-30-2015 02:53 AM
Hello :)
I try to convert a RDD[(Int,String]) to a simple String.
The goal is to concatenate all String values to a CSV string and the difficulty is to keep the order defined by the Int values.
Input RDD[(Int,String]) is like the following example:
Expected output String should be like "B","D","A","C"
Here is the code example with all details:
val data = sc.parallelize(Seq((166,"A"),(2,"B"),(200,"C"),(100,"D"))) // RDD[(Int, Any)] val keyedSortedData = data.keyBy(x => x._1).sortByKey() // RDD[(Int, (Int, Any))] val sortedData = keyedSortedData.map({case (a,(b,c)) => c}) // RDD[Any] val reducedData = sortedData.reduce((valueA, valueB) => valueA + "," + valueB).toString // String in CSV format println(reducedData) // Execution 1 - Result is: B,D,C,B // Execution 2 - Result is: A,C,D,B // Execution 3 - Result is: C,B,A,D // ...
I cannot achieve to get a fixed order... According documentation, map() and reduce() operations should not alter order defined with the keyBy() method. But it seems to alter it...
Thanks for your help or your remarks.