Reply
New Contributor
Posts: 2
Registered: ‎11-23-2017

spark streaming with linear regression spam classifier not working

[ Edited ]

spark streaming with linear regression spam classifier not working

please help me with this problem... I am trying to write a streaming application that receives mails from a folder  or socket using spark streaming and try to decide if they are spam or not based on linera regresion. the app is running and it when i place fies in the directories i see nothing happens... and i dont understand why please help.. note that the input for BOTH dirs trainingDir and testDirshould be in the form of

1/0 emailtext. Here is the code

Thanks

 

package com

// general Spark imports
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

// streaming imports
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.Seconds

// Spark MLib imports
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD

//import org.apache.spark.ml.feature.LabeledPoint

object SpamDetector{
   
  def main(args: Array[String]) {
   
    /*if (args.length != 2) {
       
      System.err.println("Usage: SpamDetector <trainingDir> <testDir>")
      System.exit(1)
    }*/
     
    val numFeatures = 10
    
    //Create conf object
    val conf = new SparkConf().setAppName("StreamingLinearRegression")
    
    // create spark context object
    val sc = new SparkContext(conf)
    
    // Create a StreamingContext with a 5-second batch size from a SparkConf
    val ssc = new StreamingContext(sc, Seconds(5))
    
    
    
    def convertEmailToLabledPoint(line: String): LabeledPoint =
    {
      val tf = new HashingTF(numFeatures)
      
      println("XXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
      println(line)
      val lineParts = line.split(" ")
     
      // Each email is split into words, and each word is mapped to one feature.
      
      val features = tf.transform(line.split(" "))
      
      println(lineParts(0).toDouble)
      
      val res = LabeledPoint(lineParts(0).toDouble, features)
  
      res
    }

    println("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA")
    // Create a DStream using socket for training data
    val trainingData = ssc.socketTextStream("localhost", 19000).map(convertEmailToLabledPoint).cache()
    
    println("BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB")
    // Create a DStream using socket for receiving emails
    val testData     = ssc.socketTextStream("localhost", 19001).map(convertEmailToLabledPoint)
    
    val model = new StreamingLinearRegressionWithSGD()
      .setInitialWeights(Vectors.zeros(numFeatures))

    model.trainOn(trainingData)
    model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()

    ssc.start()
    
    ssc.awaitTermination()  
   } 
}

  

Expert Contributor
Posts: 152
Registered: ‎07-01-2015

Re: spark streaming with linear regression spam classifier not working

You should probably post a log or error message.
New Contributor
Posts: 2
Registered: ‎11-23-2017

Re: spark streaming with linear regression spam classifier not working

[ Edited ]

Thanks for the reply PLEASE PLEASE  PLEASE  PLEASE  HELP.... 

i dont see any errors in the logs... but i also dont see any results on the screen..

Im printing some data to the screen and i dont see it anywhere...

the command im running is
sudo ./spark-submit --class com.SpamDetector --master local[2] /home/jars/SpamDetector4.jar
 
the  imput i send is paste the folwing lines to nc consoles:
nc -l -p 19000
nc -l -p 19001
and then in console 1:
 
1 some text
1 some text.
0 Dear Spark Learner, Thanks so much for attending the Spark Summit 2014!  Check out videos of talks from the summit at ...
0 Hi, Apologies for being late about emailing and forgetting to send you the package.  I hope you and bro have been ...
0 Wow, hey Fred, just heard about the Spark petabyte sort.  I think we need to take time to try it out immediately ...
0 Hi Spark user list, This is my first question to this list, so thanks in advance for your help!  I tried running ...
0 Thanks Tom for your email.  I need to refer you to Alice for this one.  I haven't yet figured out that part either ...
0 Good job yesterday!  I was attending your talk, and really enjoyed it.  I want to try out GraphX ...
0 Summit demo got whoops from audience!  Had to let you know. --Joe

i have added a new version of the code: 

package com

// general Spark imports
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

// streaming imports
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.Seconds

// Spark MLib imports
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD


object SpamDetector{
   
  def main(args: Array[String]) {
   
    /*if (args.length != 2) {
       
      System.err.println("Usage: SpamDetector <trainingDir> <testDir>")
      System.exit(1)
    }*/
     
    val numFeatures = 10
    
    //Create conf object
    val conf = new SparkConf().setAppName("StreamingLinearRegression")
    
    // create spark context object
    val sc = new SparkContext(conf)
    
    // Create a StreamingContext with a 5-second batch size from a SparkConf
    val ssc = new StreamingContext(sc, Seconds(5))
    
    
    
    def convertEmailToLabledPoint(line: String): LabeledPoint =
    {
      val tf = new HashingTF(numFeatures)
      
      println(line)
     
      val lineParts = line.split(" ")
     
      // Each email is split into words, and each word is mapped to one feature.
      
      val features = tf.transform(line.split(" "))
      
     
      println(lineParts(0).toDouble)
     
      
      val res = LabeledPoint(lineParts(0).toDouble, features)
  
      res
    }

   
    // Create a DStream using socket for training data
    val trainingData = ssc.socketTextStream("localhost", 19000).map(convertEmailToLabledPoint).cache()
    
  
    // Create a DStream using socket for receiving emails
    val testData     = ssc.socketTextStream("localhost", 19001).map(convertEmailToLabledPoint)
    
    
    val model = new StreamingLinearRegressionWithSGD()
      .setInitialWeights(Vectors.zeros(numFeatures))
      
    model.trainOn(trainingData)
    
    
    model.predictOnValues(testData
        .map(lp => (lp.label, lp.features))).foreachRDD(rdd => {
         
          println(" rdd.foreach(println)")
          rdd.foreach(println)
        
        })
    
    

    ssc.start()
    
    ssc.awaitTermination()  
   } 
}

 

The logs: (I had to trim it since it is too long)[INFO] 2017-11-26 21:16:06,963

[INFO] 2017-11-26 21:15:59,737 org.apache.spark.SparkContext logInfo -
Running Spark version 2.2.0
[WARN] 2017-11-26 21:16:00,741 org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
[INFO] 2017-11-26 21:16:01,183 org.apache.spark.SparkContext logInfo -
Submitted application: StreamingLinearRegression
[INFO] 2017-11-26 21:16:01,285 org.apache.spark.SecurityManager logInfo -
Changing view acls to: root
[INFO] 2017-11-26 21:16:01,286 org.apache.spark.SecurityManager logInfo -
Changing modify acls to: root
[INFO] 2017-11-26 21:16:01,295 org.apache.spark.SecurityManager logInfo -
Changing view acls groups to:
[INFO] 2017-11-26 21:16:01,295 org.apache.spark.SecurityManager logInfo -
Changing modify acls groups to:
[INFO] 2017-11-26 21:16:01,303 org.apache.spark.SecurityManager logInfo -
SecurityManager: authentication disabled; ui acls disabled; users with
view permissions: Set(root); groups with view permissions: Set(); users
with modify permissions: Set(root); groups with modify permissions: Set()
[INFO] 2017-11-26 21:16:02,538 org.apache.spark.util.Utils logInfo -
Successfully started service 'sparkDriver' on port 33762.
[INFO] 2017-11-26 21:16:02,627 org.apache.spark.SparkEnv logInfo -
Registering MapOutputTracker
[INFO] 2017-11-26 21:16:02,709 org.apache.spark.SparkEnv logInfo -
Registering BlockManagerMaster
[INFO] 2017-11-26 21:16:02,719
org.apache.spark.storage.BlockManagerMasterEndpoint logInfo - Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information
[INFO] 2017-11-26 21:16:02,727
org.apache.spark.storage.BlockManagerMasterEndpoint logInfo -
BlockManagerMasterEndpoint up
[INFO] 2017-11-26 21:16:02,788 org.apache.spark.storage.DiskBlockManager
logInfo - Created local directory at
/tmp/blockmgr-d89284aa-84be-463f-8071-5c6502042599
[INFO] 2017-11-26 21:16:02,879 org.apache.spark.storage.memory.MemoryStore
logInfo - MemoryStore started with capacity 366.3 MB
[INFO] 2017-11-26 21:16:03,106 org.apache.spark.SparkEnv logInfo -
Registering OutputCommitCoordinator
[INFO] 2017-11-26 21:16:03,392 org.spark_project.jetty.util.log initialized
- Logging initialized @6105ms
[INFO] 2017-11-26 21:16:03,639 org.spark_project.jetty.server.Server
doStart - jetty-9.3.z-SNAPSHOT
[INFO] 2017-11-26 21:16:03,702 org.spark_project.jetty.server.Server
doStart - Started @6416ms
[INFO] 2017-11-26 21:16:03,778
org.spark_project.jetty.server.AbstractConnector doStart - Started
ServerConnector@4b4dd216{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
[INFO] 2017-11-26 21:16:03,779 org.apache.spark.util.Utils logInfo -
Successfully started service 'SparkUI' on port 4040.
[INFO] 2017-11-26 21:16:03,906
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@48f5bde6{/jobs,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,907
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@5bd73d1a{/jobs/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,907
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@2555fff0{/jobs/job,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,908
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@7a0e1b5e{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,916
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@173b9122{/stages,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,917
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@7646731d{/stages/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,917
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@3b1bb3ab{/stages/stage,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,926
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@42a9a63e
{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,926
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@5d8445d7{/stages/pool,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,927
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@384fc774
{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,928
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@71e9a896{/storage,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,935
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@408b35bf{/storage/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,936
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@15bcf458{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,937
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@43c67247
{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,938
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@726386ed{/environment,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,946
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@14bb2297
{/environment/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,946
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@797501a{/executors,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,947
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@57f791c6
{/executors/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,948
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@6c4f9535
{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,956
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@30c31dd7
{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,986
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@596df867{/static,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,987
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@2ef8a8c3{/,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,995
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@63fd4873{/api,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,996
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@3aacf32a{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,997
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@82c57b3
{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:04,009 org.apache.spark.ui.SparkUI logInfo - Bound
SparkUI to 0.0.0.0, and started at http://10.0.0.4:4040
[INFO] 2017-11-26 21:16:04,095 org.apache.spark.SparkContext logInfo -
Added JAR file:/home/orenh/jars/SpamDetector4.jar at spark://
10.0.0.4:33762/jars/SpamDetector4.jar with timestamp 1511730964086
[INFO] 2017-11-26 21:16:04,462 org.apache.spark.executor.Executor logInfo -
Starting executor ID driver on host localhost
[INFO] 2017-11-26 21:16:04,550 org.apache.spark.util.Utils logInfo -
Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 44902.
[INFO] 2017-11-26 21:16:04,551
org.apache.spark.network.netty.NettyBlockTransferService logInfo - Server
created on 10.0.0.4:44902
[INFO] 2017-11-26 21:16:04,553 org.apache.spark.storage.BlockManager
logInfo - Using org.apache.spark.storage.RandomBlockReplicationPolicy for
block replication policy
[INFO] 2017-11-26 21:16:04,562 org.apache.spark.storage.BlockManagerMaster
logInfo - Registering BlockManager BlockManagerId(driver, 10.0.0.4, 44902,
None)
[INFO] 2017-11-26 21:16:04,600
org.apache.spark.storage.BlockManagerMasterEndpoint logInfo - Registering
block manager 10.0.0.4:44902 with 366.3 MB RAM, BlockManagerId(driver,
10.0.0.4, 44902, None)
[INFO] 2017-11-26 21:16:04,612 org.apache.spark.storage.BlockManagerMaster
logInfo - Registered BlockManager BlockManagerId(driver, 10.0.0.4, 44902,
None)
[INFO] 2017-11-26 21:16:04,613 org.apache.spark.storage.BlockManager
logInfo - Initialized BlockManager: BlockManagerId(driver, 10.0.0.4, 44902,
None)
[INFO] 2017-11-26 21:16:05,209
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@29a23c3d{/metrics/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:06,914
org.apache.spark.streaming.scheduler.ReceiverTracker logInfo - Starting 2
receivers
[INFO] 2017-11-26 21:16:06,923
org.apache.spark.streaming.scheduler.ReceiverTracker logInfo -
ReceiverTracker started
[INFO] 2017-11-26 21:16:06,944
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Slide time
= 5000 ms
[INFO] 2017-11-26 21:16:06,944
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Storage
level = Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,945
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,945
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,953
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Initialized
and validated org.apache.spark.streaming.dstream.SocketInputDStream@45b99c92
[INFO] 2017-11-26 21:16:06,953
org.apache.spark.streaming.dstream.MappedDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,953
org.apache.spark.streaming.dstream.MappedDStream logInfo - Storage level =
Memory Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,953
org.apache.spark.streaming.dstream.MappedDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.MappedDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.MappedDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.MappedDStream@48f33460
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Storage level =
Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.ForEachDStream@3f6e445d
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Slide time
= 5000 ms
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Storage
level = Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Initialized
and validated org.apache.spark.streaming.dstream.SocketInputDStream@63b6e981
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.MappedDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.MappedDStream logInfo - Storage level =
Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.MappedDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.MappedDStream@6ae2a469
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Storage level =
Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.MappedDStream@3be3dd61
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Storage level
= Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Initialized
and validated org.apache.spark.streaming.dstream.MapValuedDStream@76622aec
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Storage level =
Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,965
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.ForEachDStream@22cc3f40
[INFO] 2017-11-26 21:16:07,273
org.apache.spark.streaming.scheduler.ReceiverTracker logInfo - Receiver 0
started
[INFO] 2017-11-26 21:16:07,341
org.apache.spark.streaming.scheduler.ReceiverTracker logInfo - Receiver 1
started
[INFO] 2017-11-26 21:16:07,402 org.apache.spark.scheduler.DAGScheduler
logInfo - Got job 0 (start at SpamDetector.scala:91) with 1 output
partitions
[INFO] 2017-11-26 21:16:07,402 org.apache.spark.scheduler.DAGScheduler
logInfo - Final stage: ResultStage 0 (start at SpamDetector.scala:91)
[INFO] 2017-11-26 21:16:07,411
org.apache.spark.streaming.util.RecurringTimer logInfo - Started timer for
JobGenerator at time 1511730970000
[INFO] 2017-11-26 21:16:07,411 org.apache.spark.scheduler.DAGScheduler
logInfo - Parents of final stage: List()
[INFO] 2017-11-26 21:16:07,412
org.apache.spark.streaming.scheduler.JobGenerator logInfo - Started
JobGenerator at 1511730970000 ms
[INFO] 2017-11-26 21:16:07,413
org.apache.spark.streaming.scheduler.JobScheduler logInfo - Started
JobScheduler
[INFO] 2017-11-26 21:16:07,419 org.apache.spark.scheduler.DAGScheduler
logInfo - Missing parents: List()
[INFO] 2017-11-26 21:16:07,449
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@32f0c7f8{/streaming,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,450
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@71f96dfb
{/streaming/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,459
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@7275c74b
{/streaming/batch,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,460
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@4315e9af
{/streaming/batch/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,461
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@479f2dc2
{/static/streaming,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,461 org.apache.spark.streaming.StreamingContext
logInfo - StreamingContext started
[INFO] 2017-11-26 21:16:07,481 org.apache.spark.scheduler.DAGScheduler
logInfo - Submitting ResultStage 0 (Receiver 0 ParallelCollectionRDD[0] at
makeRDD at ReceiverTracker.scala:620), which has no missing parents
[INFO] 2017-11-26 21:16:08,162 org.apache.spark.storage.memory.MemoryStore
logInfo - Block broadcast_0 stored as values in memory (estimated size 52.4
KB, free 366.2 MB)
[INFO] 2017-11-26 21:16:08,289 org.apache.spark.storage.memory.MemoryStore
logInfo - Block broadcast_0_piece0 stored as bytes in memory (estimated
size 17.5 KB, free 366.2 MB)
[INFO] 2017-11-26 21:16:08,302 org.apache.spark.storage.BlockManagerInfo
logInfo - Added broadcast_0_piece0 in memory on 10.0.0.4:44902 (size: 17.5
KB, free: 366.3 MB)
[INFO] 2017-11-26 21:16:08,308 org.apache.spark.SparkContext logInfo -
Created broadcast 0 from broadcast at DAGScheduler.scala:1006
[INFO] 2017-11-26 21:16:08,397 org.apache.spark.scheduler.DAGScheduler
logInfo - Submitting 1 missing tasks from ResultStage 0 (Receiver 0
ParallelCollectionRDD[0] at makeRDD at ReceiverTracker.scala:620) (first 15
tasks are for partitions Vector(0))
[INFO] 2017-11-26 21:16:08,406 org.apache.spark.scheduler.TaskSchedulerImpl
logInfo - Adding task set 0.0 with 1 tasks
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Got job 1 (start at SpamDetector.scala:91) with 1 output
partitions
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Final stage: ResultStage 1 (start at SpamDetector.scala:91)
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Parents of final stage: List()
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Missing parents: List()
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Submitting ResultStage 1 (Receiver 1 ParallelCollectionRDD[1] at
start at SpamDetector.scala:91), which has no missing parents
[INFO] 2017-11-26 21:16:08,567 org.apache.spark.storage.memory.MemoryStore
logInfo - Block broadcast_1 stored as values in memory (estimated size 52.4
KB, free 366.2 MB)
[INFO] 2017-11-26 21:16:08,585 org.apache.spark.storage.memory.MemoryStore
logInfo - Block broadcast_1_piece0 stored as bytes in memory (estimated
size 17.5 KB, free 366.2 MB)
[INFO] 2017-11-26 21:16:08,596 org.apache.spark.storage.BlockManagerInfo
logInfo - Added broadcast_1_piece0 in memory on 10.0.0.4:44902 (size: 17.5
KB, free: 366.3 MB)
[INFO] 2017-11-26 21:16:08,606 org.apache.spark.SparkContext logInfo -
Created broadcast 1 from broadcast at DAGScheduler.scala:1006
[INFO] 2017-11-26 21:16:08,615 org.apache.spark.scheduler.DAGScheduler
logInfo - Submitting 1 missing tasks from ResultStage 1 (Receiver 1
ParallelCollectionRDD[1] at start at SpamDetector.scala:91) (first 15 tasks
are for partitions Vector(0))
[INFO] 2017-11-26 21:16:08,615 org.apache.spark.scheduler.TaskSchedulerImpl
logInfo - Adding task set 1.0 with 1 tasks
[INFO] 2017-11-26 21:16:08,674 org.apache.spark.scheduler.TaskSetManager
logInfo - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor
driver, partition 0, PROCESS_LOCAL, 5414 bytes)
[INFO] 2017-11-26 21:16:08,733 org.apache.spark.scheduler.TaskSetManager
logInfo - Starting task 0.0 in stage 1.0 (TID 1, localhost, executor
driver, partition 0, PROCESS_LOCAL, 5414 bytes)
[INFO] 2017-11-26 21:16:08,743 org.apache.spark.executor.Executor logInfo -
Running task 0.0 in stage 0.0 (TID 0)
[INFO] 2017-11-26 21:16:08,761 org.apache.spark.executor.Executor logInfo -
Running task 0.0 in stage 1.0 (TID 1)
[INFO] 2017-11-26 21:16:08,792 org.apache.spark.executor.Executor logInfo -
Fetching spark://10.0.0.4:33762/jars/SpamDetector4.jar with timestamp
1511730964086
[INFO] 2017-11-26 21:16:08,989
org.apache.spark.network.client.TransportClientFactory createClient -
Successfully created connection to /10.0.0.4:33762 after 99 ms (0 ms spent
in bootstraps)
[INFO] 2017-11-26 21:16:09,029 org.apache.spark.util.Utils logInfo -
Fetching spark://10.0.0.4:33762/jars/SpamDetector4.jar to
/tmp/spark-73c4a44f-ace6-4dd0-9478-f1e47e80988f/userFiles-8c65ec0e-5b76-4dea-a5b2-291d64d6169a/fetchFileTemp8176107667231819474.tmp
[INFO] 2017-11-26 21:16:09,227 org.apache.spark.executor.Executor logInfo -
Adding
file:/tmp/spark-73c4a44f-ace6-4dd0-9478-f1e47e80988f/userFiles-8c65ec0e-5b76-4dea-a5b2-291d64d6169a/SpamDetector4.jar
to class loader
[INFO] 2017-11-26 21:16:09,591
org.apache.spark.streaming.util.RecurringTimer logInfo - Started timer for
BlockGenerator at time 1511730969600
[INFO] 2017-11-26 21:16:09,591
org.apache.spark.streaming.receiver.BlockGenerator logInfo - Started
BlockGenerator
[INFO] 2017-11-26 21:16:09,592
org.apache.spark.streaming.receiver.BlockGenerator logInfo - Started block
pushing thread







Announcements