Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Printing Fields in Spark Streaming vs Spark

avatar
Master Guru

Spark Scala Code

// Batch
val file = sc.textFile("hdfs://isi.xyz.com:8020/user/test/AsRun.txt")
val testdataframe = file.map(x => x.split("\\|"))
testdataframe.take(5)
 

//Streaming Code
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.streaming.{Seconds, StreamingContext}
import StreamingContext._
import org.apache.hadoop.conf._
import org.apache.hadoop.fs._
 
 
  object RatingsMatch{
                   
  
    def main(args: Array[String]){
      //set app name
        val sparkConf = new SparkConf().setAppName("RatingsMatch")
        val conf = new SparkContext(sparkConf)
        val ssc = new StreamingContext(conf, Seconds(240))
        val file = ssc.textFileStream(args(0))
       
         //file.foreachRDD(rdd=>rdd.map(x => x.split("\\|")).foreach(println))
       
        //val myfilemap = file.map(x => x.split(","))
        //myfilemap.print()
       
        val myfilemap = file.transform(rdd => {rdd.map(x => x.split("\\|"))})
       
        myfilemap.print()
        //As Run Schema
       
         //myfilemap.foreachRDD{rdd =>
        //rdd.foreach.toArray(println)
       //}
 
       
        
        ssc.start()
        ssc.awaitTermination()
       
    }
           
  }
 

I am trying to set up a Spark Streaming job. I’ve been able to get the cookie cutter sample word count to run. Now I am trying with our data. I can split and map the text file from Zeppelin or in cli using the batch engine. However, when I do the same for the dstream I get the output (pasted below the code). Any thoughts? I’ve tried a handful of approaches with Streaming using dstream.map, foreachrdd, and dstream.transform. I thought it may have been the regular expression to parse so I tried to change to a “,”. However, I still get the same results.

[Ljava.lang.String;@c080470
[Ljava.lang.String;@1d6b8b9
[Ljava.lang.String;@2876a606
[Ljava.lang.String;@7fe36aa3
[Ljava.lang.String;@3304daab
[Ljava.lang.String;@723bf02
[Ljava.lang.String;@1af86f76
[Ljava.lang.String;@7eaab8f8
[Ljava.lang.String;@6b6ee404
[Ljava.lang.String;@71af9dc4
1 ACCEPTED SOLUTION

avatar
Master Guru

Okay, silly mistake

myfilemap.foreachRDD(rdd => if (!rdd.isEmpty()) {
  rdd.collect().foreach(println)
})

View solution in original post

1 REPLY 1

avatar
Master Guru

Okay, silly mistake

myfilemap.foreachRDD(rdd => if (!rdd.isEmpty()) {
  rdd.collect().foreach(println)
})