Support Questions

ArunShell · ‎09-10-2014

Hi,

I am joining my streming data with data which is already present in HDFS. When i use scala shell its working fine, and the data is getting joined.

But when i try to compile the same code in eclipse to make as a jar, the joining part is not working.

Please give some suggestion to solve the issue.

I am facing the error in the following part..

val streamkv = streamrecs.map(_.split("~")).map(r => ( r(0), (r(5), r(6))))
val HDFSlines = sc.textFile("/user/Rest/sample.dat").map(_.split("~")).map(r => ( r(1), (r(0))))
val streamwindow = streamkv.window(Minutes(1))

val join = streamwindow.transform(joinRDD => { joinRDD.join(HDFSlines)} )

In this step i am getting error that Value join is not a member of org.apache.spark.rdd.RDD(String,(String,String)))

I used the same code in scala-shell, there its working fine..

I have imported all the necessary packages as below -

import scala.io.Source
import java.io._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.api.java.function._
import org.apache.spark.streaming._
import org.apache.spark.streaming.api._
import org.apache.spark.streaming.StreamingContext._
import StreamingContext._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark._
import org.apache.spark.rdd.PairRDDFunctions
import org.apache.spark.streaming.dstream.PairDStreamFunctions

srowen · ‎09-10-2014

I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need:

import org.apache.spark.SparkContext._

In the shell this is imported by default.

View solution in original post

srowen · ‎09-10-2014

I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need:

import org.apache.spark.SparkContext._

In the shell this is imported by default.

Cloudera Community

Support Questions

Joining Streaming Data with HDFS File

Writing parquet on HDFS using Spark Streaming

Spark Streaming in CDE with Stream Messaging Manag...

Streaming Ingest of Google Sheets with HDF 2.0

Best Practices: Linux File Systems for HDFS

Streaming/Query data to CDP Public Cloud Using Clo...

Accelerating Streaming Analytics with Spark and HD...

Spark Structured Streaming example with CDE

Hive Streaming Compaction

Old data in HDFS

Spark Scala - Join multiple files using Spark