Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Joining Streaming Data with HDFS File

Solved Go to solution

Joining Streaming Data with HDFS File

Explorer

Hi, 

I am joining my streming data with data which is already present in HDFS. When i use scala shell its working fine, and the data is getting joined.

 

But when i try to compile the same code in eclipse to make as a jar, the joining part is not working.

 

Please give some suggestion to solve the issue. 

I am facing the error in the following part.. 

val streamkv = streamrecs.map(_.split("~")).map(r => ( r(0), (r(5), r(6)))) 
val HDFSlines = sc.textFile("/user/Rest/sample.dat").map(_.split("~")).map(r => ( r(1), (r(0)))) 
val streamwindow = streamkv.window(Minutes(1)) 

val join = streamwindow.transform(joinRDD => { joinRDD.join(HDFSlines)} ) 

In this step i am getting error that Value join is not a member of org.apache.spark.rdd.RDD(String,(String,String))) 

I used the same code in scala-shell, there its working fine.. 

 

I have imported all the necessary packages  as below - 

 

import scala.io.Source 
import java.io._ 
import org.apache.spark.streaming._ 
import org.apache.spark.streaming.StreamingContext._ 
import org.apache.spark.api.java.function._ 
import org.apache.spark.streaming._ 
import org.apache.spark.streaming.api._ 
import org.apache.spark.streaming.StreamingContext._ 
import StreamingContext._ 
import org.apache.spark.SparkConf 
import org.apache.spark.SparkContext 
import org.apache.spark._ 
import org.apache.spark.rdd.PairRDDFunctions 
import org.apache.spark.streaming.dstream.PairDStreamFunctions 


1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Joining Streaming Data with HDFS File

Master Collaborator

I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need:

 

import org.apache.spark.SparkContext._

 

In the shell this is imported by default.

1 REPLY 1
Highlighted

Re: Joining Streaming Data with HDFS File

Master Collaborator

I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need:

 

import org.apache.spark.SparkContext._

 

In the shell this is imported by default.