I am trying to verify cogroup join and groupByKey for PairRDDs. I could check that in Spark Java API. But, cannot do it with scala project
Below is the simple code that i tried, let me know where i made mistake.
object PairsCheck {
def main(args: Array[String]) = {
val conf = new SparkConf;
val sc = new SparkContext(conf)
val lines = sc.textFile("/home/test1.txt")
val lines2 = sc.textFile("/home/test2.txt")
val words = lines.flatMap { x => x.split("\\W+") }
val words2 = lines2.flatMap { x => x.split("\\W+") }
val pairs: RDD[(Int, String)] = words.map {case(x) => (x.length(), x) }
val pairs2: RDD[(Int, String)] = words2.map {case(x) => (x.length(), x) }
import org.apache.spark.SparkContext._
// --> Here i tried to call join/co group functions that applies for pairsRDD, but could not do that. If i call join, it is throwing error.
Thank you in advance.