<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Printing Fields in Spark Streaming vs Spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Printing-Fields-in-Spark-Streaming-vs-Spark/m-p/153340#M44693</link>
    <description>&lt;P&gt;Okay, silly mistake&lt;/P&gt;&lt;PRE&gt;myfilemap.foreachRDD(rdd =&amp;gt; if (!rdd.isEmpty()) {
  rdd.collect().foreach(println)
})&lt;/PRE&gt;</description>
    <pubDate>Thu, 27 Oct 2016 23:50:31 GMT</pubDate>
    <dc:creator>TimothySpann</dc:creator>
    <dc:date>2016-10-27T23:50:31Z</dc:date>
    <item>
      <title>Printing Fields in Spark Streaming vs Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Printing-Fields-in-Spark-Streaming-vs-Spark/m-p/153339#M44692</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Spark Scala Code&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;// Batch
val file = sc.textFile("hdfs://isi.xyz.com:8020/user/test/AsRun.txt")
val testdataframe = file.map(x =&amp;gt; x.split("\\|"))
testdataframe.take(5)
 

//Streaming Code
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.streaming.{Seconds, StreamingContext}
import StreamingContext._
import org.apache.hadoop.conf._
import org.apache.hadoop.fs._
 
 
  object RatingsMatch{
                   
  
    def main(args: Array[String]){
      //set app name
        val sparkConf = new SparkConf().setAppName("RatingsMatch")
        val conf = new SparkContext(sparkConf)
        val ssc = new StreamingContext(conf, Seconds(240))
        val file = ssc.textFileStream(args(0))
       
         //file.foreachRDD(rdd=&amp;gt;rdd.map(x =&amp;gt; x.split("\\|")).foreach(println))
       
        //val myfilemap = file.map(x =&amp;gt; x.split(","))
        //myfilemap.print()
       
        val myfilemap = file.transform(rdd =&amp;gt; {rdd.map(x =&amp;gt; x.split("\\|"))})
       
        myfilemap.print()
        //As Run Schema
       
         //myfilemap.foreachRDD{rdd =&amp;gt;
        //rdd.foreach.toArray(println)
       //}
 
       
        
        ssc.start()
        ssc.awaitTermination()
       
    }
           
  }
 &lt;/PRE&gt;&lt;P&gt;I am trying to set up a Spark Streaming job.  I’ve been able to get the cookie cutter sample word count to run.  Now I am trying with our data.  I can split and map the text file from Zeppelin or in cli using the batch engine.  However, when I do the same for the dstream I get the output (pasted below the code).  Any thoughts?  I’ve tried a handful of approaches with Streaming using dstream.map, foreachrdd, and dstream.transform.  I thought it may have been the regular expression to parse so I tried to change to a “,”.  However, I still get the same results.&lt;/P&gt;&lt;PRE&gt;[Ljava.lang.String;@c080470
[Ljava.lang.String;@1d6b8b9
[Ljava.lang.String;@2876a606
[Ljava.lang.String;@7fe36aa3
[Ljava.lang.String;@3304daab
[Ljava.lang.String;@723bf02
[Ljava.lang.String;@1af86f76
[Ljava.lang.String;@7eaab8f8
[Ljava.lang.String;@6b6ee404
[Ljava.lang.String;@71af9dc4
&lt;/PRE&gt;</description>
      <pubDate>Thu, 27 Oct 2016 23:30:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Printing-Fields-in-Spark-Streaming-vs-Spark/m-p/153339#M44692</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2016-10-27T23:30:29Z</dc:date>
    </item>
    <item>
      <title>Re: Printing Fields in Spark Streaming vs Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Printing-Fields-in-Spark-Streaming-vs-Spark/m-p/153340#M44693</link>
      <description>&lt;P&gt;Okay, silly mistake&lt;/P&gt;&lt;PRE&gt;myfilemap.foreachRDD(rdd =&amp;gt; if (!rdd.isEmpty()) {
  rdd.collect().foreach(println)
})&lt;/PRE&gt;</description>
      <pubDate>Thu, 27 Oct 2016 23:50:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Printing-Fields-in-Spark-Streaming-vs-Spark/m-p/153340#M44693</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2016-10-27T23:50:31Z</dc:date>
    </item>
  </channel>
</rss>

