Posts: 62
Registered: ‎01-22-2014
Accepted Solution

Joining Streaming Data with HDFS File


I am joining my streming data with data which is already present in HDFS. When i use scala shell its working fine, and the data is getting joined.


But when i try to compile the same code in eclipse to make as a jar, the joining part is not working.


Please give some suggestion to solve the issue. 

I am facing the error in the following part.. 

val streamkv ="~")).map(r => ( r(0), (r(5), r(6)))) 
val HDFSlines = sc.textFile("/user/Rest/sample.dat").map(_.split("~")).map(r => ( r(1), (r(0)))) 
val streamwindow = streamkv.window(Minutes(1)) 

val join = streamwindow.transform(joinRDD => { joinRDD.join(HDFSlines)} ) 

In this step i am getting error that Value join is not a member of org.apache.spark.rdd.RDD(String,(String,String))) 

I used the same code in scala-shell, there its working fine.. 


I have imported all the necessary packages  as below - 


import org.apache.spark.streaming._ 
import org.apache.spark.streaming.StreamingContext._ 
import org.apache.spark.streaming._ 
import org.apache.spark.streaming.api._ 
import org.apache.spark.streaming.StreamingContext._ 
import StreamingContext._ 
import org.apache.spark.SparkConf 
import org.apache.spark.SparkContext 
import org.apache.spark._ 
import org.apache.spark.rdd.PairRDDFunctions 
import org.apache.spark.streaming.dstream.PairDStreamFunctions 

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Joining Streaming Data with HDFS File

I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need:


import org.apache.spark.SparkContext._


In the shell this is imported by default.