Joining Streaming Data with HDFS File

ArunShell — Fri, 16 Sep 2022 09:07:18 GMT

Hi,

I am joining my streming data with data which is already present in HDFS. When i use scala shell its working fine, and the data is getting joined.

But when i try to compile the same code in eclipse to make as a jar, the joining part is not working.

Please give some suggestion to solve the issue.

I am facing the error in the following part..

val streamkv = streamrecs.map(_.split("~")).map(r => ( r(0), (r(5), r(6))))
val HDFSlines = sc.textFile("/user/Rest/sample.dat").map(_.split("~")).map(r => ( r(1), (r(0))))
val streamwindow = streamkv.window(Minutes(1))

val join = streamwindow.transform(joinRDD => { joinRDD.join(HDFSlines)} )

In this step i am getting error that Value join is not a member of org.apache.spark.rdd.RDD(String,(String,String)))

I used the same code in scala-shell, there its working fine..

I have imported all the necessary packages as below -

import scala.io.Source
import java.io._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.api.java.function._
import org.apache.spark.streaming._
import org.apache.spark.streaming.api._
import org.apache.spark.streaming.StreamingContext._
import StreamingContext._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark._
import org.apache.spark.rdd.PairRDDFunctions
import org.apache.spark.streaming.dstream.PairDStreamFunctions

Re: Joining Streaming Data with HDFS File

srowen — Wed, 10 Sep 2014 15:14:51 GMT

I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need:

import org.apache.spark.SparkContext._

In the shell this is imported by default.

question Re: Joining Streaming Data with HDFS File in Archives of Support Questions (Read Only)

Joining Streaming Data with HDFS File

Re: Joining Streaming Data with HDFS File