Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Joining Streaming Data with HDFS File

avatar
Explorer

Hi, 

I am joining my streming data with data which is already present in HDFS. When i use scala shell its working fine, and the data is getting joined.

 

But when i try to compile the same code in eclipse to make as a jar, the joining part is not working.

 

Please give some suggestion to solve the issue. 

I am facing the error in the following part.. 

val streamkv = streamrecs.map(_.split("~")).map(r => ( r(0), (r(5), r(6)))) 
val HDFSlines = sc.textFile("/user/Rest/sample.dat").map(_.split("~")).map(r => ( r(1), (r(0)))) 
val streamwindow = streamkv.window(Minutes(1)) 

val join = streamwindow.transform(joinRDD => { joinRDD.join(HDFSlines)} ) 

In this step i am getting error that Value join is not a member of org.apache.spark.rdd.RDD(String,(String,String))) 

I used the same code in scala-shell, there its working fine.. 

 

I have imported all the necessary packages  as below - 

 

import scala.io.Source 
import java.io._ 
import org.apache.spark.streaming._ 
import org.apache.spark.streaming.StreamingContext._ 
import org.apache.spark.api.java.function._ 
import org.apache.spark.streaming._ 
import org.apache.spark.streaming.api._ 
import org.apache.spark.streaming.StreamingContext._ 
import StreamingContext._ 
import org.apache.spark.SparkConf 
import org.apache.spark.SparkContext 
import org.apache.spark._ 
import org.apache.spark.rdd.PairRDDFunctions 
import org.apache.spark.streaming.dstream.PairDStreamFunctions 


1 ACCEPTED SOLUTION

avatar
Master Collaborator

I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need:

 

import org.apache.spark.SparkContext._

 

In the shell this is imported by default.

View solution in original post

1 REPLY 1

avatar
Master Collaborator

I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need:

 

import org.apache.spark.SparkContext._

 

In the shell this is imported by default.