Support Questions
Find answers, ask questions, and share your expertise

spark streaming with linear regression spam classifier not working

Highlighted

spark streaming with linear regression spam classifier not working

New Contributor

spark streaming with linear regression spam classifier not working

please help me with this problem... I am trying to write a streaming application that receives mails from a folder  or socket using spark streaming and try to decide if they are spam or not based on linera regresion. the app is running and it when i place fies in the directories i see nothing happens... and i dont understand why please help.. note that the input for BOTH dirs trainingDir and testDirshould be in the form of

1/0 emailtext. Here is the code

Thanks

 

package com

// general Spark imports
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

// streaming imports
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.Seconds

// Spark MLib imports
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD

//import org.apache.spark.ml.feature.LabeledPoint

object SpamDetector{
   
  def main(args: Array[String]) {
   
    /*if (args.length != 2) {
       
      System.err.println("Usage: SpamDetector <trainingDir> <testDir>")
      System.exit(1)
    }*/
     
    val numFeatures = 10
    
    //Create conf object
    val conf = new SparkConf().setAppName("StreamingLinearRegression")
    
    // create spark context object
    val sc = new SparkContext(conf)
    
    // Create a StreamingContext with a 5-second batch size from a SparkConf
    val ssc = new StreamingContext(sc, Seconds(5))
    
    
    
    def convertEmailToLabledPoint(line: String): LabeledPoint =
    {
      val tf = new HashingTF(numFeatures)
      
      println("XXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
      println(line)
      val lineParts = line.split(" ")
     
      // Each email is split into words, and each word is mapped to one feature.
      
      val features = tf.transform(line.split(" "))
      
      println(lineParts(0).toDouble)
      
      val res = LabeledPoint(lineParts(0).toDouble, features)
  
      res
    }

    println("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA")
    // Create a DStream using socket for training data
    val trainingData = ssc.socketTextStream("localhost", 19000).map(convertEmailToLabledPoint).cache()
    
    println("BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB")
    // Create a DStream using socket for receiving emails
    val testData     = ssc.socketTextStream("localhost", 19001).map(convertEmailToLabledPoint)
    
    val model = new StreamingLinearRegressionWithSGD()
      .setInitialWeights(Vectors.zeros(numFeatures))

    model.trainOn(trainingData)
    model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()

    ssc.start()
    
    ssc.awaitTermination()  
   } 
}

  

2 REPLIES 2
Highlighted

Re: spark streaming with linear regression spam classifier not working

Master Collaborator
You should probably post a log or error message.
Highlighted

Re: spark streaming with linear regression spam classifier not working

New Contributor

Thanks for the reply PLEASE PLEASE  PLEASE  PLEASE  HELP.... 

i dont see any errors in the logs... but i also dont see any results on the screen..

Im printing some data to the screen and i dont see it anywhere...

the command im running is
sudo ./spark-submit --class com.SpamDetector --master local[2] /home/jars/SpamDetector4.jar
 
the  imput i send is paste the folwing lines to nc consoles:
nc -l -p 19000
nc -l -p 19001
and then in console 1:
 
1 some text
1 some text.
0 Dear Spark Learner, Thanks so much for attending the Spark Summit 2014!  Check out videos of talks from the summit at ...
0 Hi, Apologies for being late about emailing and forgetting to send you the package.  I hope you and bro have been ...
0 Wow, hey Fred, just heard about the Spark petabyte sort.  I think we need to take time to try it out immediately ...
0 Hi Spark user list, This is my first question to this list, so thanks in advance for your help!  I tried running ...
0 Thanks Tom for your email.  I need to refer you to Alice for this one.  I haven't yet figured out that part either ...
0 Good job yesterday!  I was attending your talk, and really enjoyed it.  I want to try out GraphX ...
0 Summit demo got whoops from audience!  Had to let you know. --Joe

i have added a new version of the code: 

package com

// general Spark imports
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

// streaming imports
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.Seconds

// Spark MLib imports
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD


object SpamDetector{
   
  def main(args: Array[String]) {
   
    /*if (args.length != 2) {
       
      System.err.println("Usage: SpamDetector <trainingDir> <testDir>")
      System.exit(1)
    }*/
     
    val numFeatures = 10
    
    //Create conf object
    val conf = new SparkConf().setAppName("StreamingLinearRegression")
    
    // create spark context object
    val sc = new SparkContext(conf)
    
    // Create a StreamingContext with a 5-second batch size from a SparkConf
    val ssc = new StreamingContext(sc, Seconds(5))
    
    
    
    def convertEmailToLabledPoint(line: String): LabeledPoint =
    {
      val tf = new HashingTF(numFeatures)
      
      println(line)
     
      val lineParts = line.split(" ")
     
      // Each email is split into words, and each word is mapped to one feature.
      
      val features = tf.transform(line.split(" "))
      
     
      println(lineParts(0).toDouble)
     
      
      val res = LabeledPoint(lineParts(0).toDouble, features)
  
      res
    }

   
    // Create a DStream using socket for training data
    val trainingData = ssc.socketTextStream("localhost", 19000).map(convertEmailToLabledPoint).cache()
    
  
    // Create a DStream using socket for receiving emails
    val testData     = ssc.socketTextStream("localhost", 19001).map(convertEmailToLabledPoint)
    
    
    val model = new StreamingLinearRegressionWithSGD()
      .setInitialWeights(Vectors.zeros(numFeatures))
      
    model.trainOn(trainingData)
    
    
    model.predictOnValues(testData
        .map(lp => (lp.label, lp.features))).foreachRDD(rdd => {
         
          println(" rdd.foreach(println)")
          rdd.foreach(println)
        
        })
    
    

    ssc.start()
    
    ssc.awaitTermination()  
   } 
}

 

The logs: (I had to trim it since it is too long)[INFO] 2017-11-26 21:16:06,963

[INFO] 2017-11-26 21:15:59,737 org.apache.spark.SparkContext logInfo -
Running Spark version 2.2.0
[WARN] 2017-11-26 21:16:00,741 org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
[INFO] 2017-11-26 21:16:01,183 org.apache.spark.SparkContext logInfo -
Submitted application: StreamingLinearRegression
[INFO] 2017-11-26 21:16:01,285 org.apache.spark.SecurityManager logInfo -
Changing view acls to: root
[INFO] 2017-11-26 21:16:01,286 org.apache.spark.SecurityManager logInfo -
Changing modify acls to: root
[INFO] 2017-11-26 21:16:01,295 org.apache.spark.SecurityManager logInfo -
Changing view acls groups to:
[INFO] 2017-11-26 21:16:01,295 org.apache.spark.SecurityManager logInfo -
Changing modify acls groups to:
[INFO] 2017-11-26 21:16:01,303 org.apache.spark.SecurityManager logInfo -
SecurityManager: authentication disabled; ui acls disabled; users with
view permissions: Set(root); groups with view permissions: Set(); users
with modify permissions: Set(root); groups with modify permissions: Set()
[INFO] 2017-11-26 21:16:02,538 org.apache.spark.util.Utils logInfo -
Successfully started service 'sparkDriver' on port 33762.
[INFO] 2017-11-26 21:16:02,627 org.apache.spark.SparkEnv logInfo -
Registering MapOutputTracker
[INFO] 2017-11-26 21:16:02,709 org.apache.spark.SparkEnv logInfo -
Registering BlockManagerMaster
[INFO] 2017-11-26 21:16:02,719
org.apache.spark.storage.BlockManagerMasterEndpoint logInfo - Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information
[INFO] 2017-11-26 21:16:02,727
org.apache.spark.storage.BlockManagerMasterEndpoint logInfo -
BlockManagerMasterEndpoint up
[INFO] 2017-11-26 21:16:02,788 org.apache.spark.storage.DiskBlockManager
logInfo - Created local directory at
/tmp/blockmgr-d89284aa-84be-463f-8071-5c6502042599
[INFO] 2017-11-26 21:16:02,879 org.apache.spark.storage.memory.MemoryStore
logInfo - MemoryStore started with capacity 366.3 MB
[INFO] 2017-11-26 21:16:03,106 org.apache.spark.SparkEnv logInfo -
Registering OutputCommitCoordinator
[INFO] 2017-11-26 21:16:03,392 org.spark_project.jetty.util.log initialized
- Logging initialized @6105ms
[INFO] 2017-11-26 21:16:03,639 org.spark_project.jetty.server.Server
doStart - jetty-9.3.z-SNAPSHOT
[INFO] 2017-11-26 21:16:03,702 org.spark_project.jetty.server.Server
doStart - Started @6416ms
[INFO] 2017-11-26 21:16:03,778
org.spark_project.jetty.server.AbstractConnector doStart - Started
ServerConnector@4b4dd216{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
[INFO] 2017-11-26 21:16:03,779 org.apache.spark.util.Utils logInfo -
Successfully started service 'SparkUI' on port 4040.
[INFO] 2017-11-26 21:16:03,906
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@48f5bde6{/jobs,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,907
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@5bd73d1a{/jobs/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,907
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@2555fff0{/jobs/job,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,908
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@7a0e1b5e{/jobs/job/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,916
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@173b9122{/stages,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,917
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@7646731d{/stages/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,917
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@3b1bb3ab{/stages/stage,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,926
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@42a9a63e
{/stages/stage/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,926
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@5d8445d7{/stages/pool,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,927
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@384fc774
{/stages/pool/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,928
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@71e9a896{/storage,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,935
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@408b35bf{/storage/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,936
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@15bcf458{/storage/rdd,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,937
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@43c67247
{/storage/rdd/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,938
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@726386ed{/environment,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,946
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@14bb2297
{/environment/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,946
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@797501a{/executors,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,947
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@57f791c6
{/executors/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,948
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@6c4f9535
{/executors/threadDump,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,956
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@30c31dd7
{/executors/threadDump/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,986
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@596df867{/static,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,987
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@2ef8a8c3{/,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,995
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@63fd4873{/api,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,996
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@3aacf32a{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:03,997
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@82c57b3
{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:04,009 org.apache.spark.ui.SparkUI logInfo - Bound
SparkUI to 0.0.0.0, and started at http://10.0.0.4:4040
[INFO] 2017-11-26 21:16:04,095 org.apache.spark.SparkContext logInfo -
Added JAR file:/home/orenh/jars/SpamDetector4.jar at spark://
10.0.0.4:33762/jars/SpamDetector4.jar with timestamp 1511730964086
[INFO] 2017-11-26 21:16:04,462 org.apache.spark.executor.Executor logInfo -
Starting executor ID driver on host localhost
[INFO] 2017-11-26 21:16:04,550 org.apache.spark.util.Utils logInfo -
Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 44902.
[INFO] 2017-11-26 21:16:04,551
org.apache.spark.network.netty.NettyBlockTransferService logInfo - Server
created on 10.0.0.4:44902
[INFO] 2017-11-26 21:16:04,553 org.apache.spark.storage.BlockManager
logInfo - Using org.apache.spark.storage.RandomBlockReplicationPolicy for
block replication policy
[INFO] 2017-11-26 21:16:04,562 org.apache.spark.storage.BlockManagerMaster
logInfo - Registering BlockManager BlockManagerId(driver, 10.0.0.4, 44902,
None)
[INFO] 2017-11-26 21:16:04,600
org.apache.spark.storage.BlockManagerMasterEndpoint logInfo - Registering
block manager 10.0.0.4:44902 with 366.3 MB RAM, BlockManagerId(driver,
10.0.0.4, 44902, None)
[INFO] 2017-11-26 21:16:04,612 org.apache.spark.storage.BlockManagerMaster
logInfo - Registered BlockManager BlockManagerId(driver, 10.0.0.4, 44902,
None)
[INFO] 2017-11-26 21:16:04,613 org.apache.spark.storage.BlockManager
logInfo - Initialized BlockManager: BlockManagerId(driver, 10.0.0.4, 44902,
None)
[INFO] 2017-11-26 21:16:05,209
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@29a23c3d{/metrics/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:06,914
org.apache.spark.streaming.scheduler.ReceiverTracker logInfo - Starting 2
receivers
[INFO] 2017-11-26 21:16:06,923
org.apache.spark.streaming.scheduler.ReceiverTracker logInfo -
ReceiverTracker started
[INFO] 2017-11-26 21:16:06,944
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Slide time
= 5000 ms
[INFO] 2017-11-26 21:16:06,944
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Storage
level = Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,945
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,945
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,953
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Initialized
and validated org.apache.spark.streaming.dstream.SocketInputDStream@45b99c92
[INFO] 2017-11-26 21:16:06,953
org.apache.spark.streaming.dstream.MappedDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,953
org.apache.spark.streaming.dstream.MappedDStream logInfo - Storage level =
Memory Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,953
org.apache.spark.streaming.dstream.MappedDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.MappedDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.MappedDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.MappedDStream@48f33460
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Storage level =
Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.ForEachDStream@3f6e445d
[INFO] 2017-11-26 21:16:06,954
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Slide time
= 5000 ms
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Storage
level = Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.SocketInputDStream logInfo - Initialized
and validated org.apache.spark.streaming.dstream.SocketInputDStream@63b6e981
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.MappedDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.MappedDStream logInfo - Storage level =
Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,955
org.apache.spark.streaming.dstream.MappedDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.MappedDStream@6ae2a469
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Storage level =
Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,963
org.apache.spark.streaming.dstream.MappedDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.MappedDStream@3be3dd61
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Storage level
= Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.MapValuedDStream logInfo - Initialized
and validated org.apache.spark.streaming.dstream.MapValuedDStream@76622aec
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Slide time =
5000 ms
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Storage level =
Serialized 1x Replicated
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Checkpoint
interval = null
[INFO] 2017-11-26 21:16:06,964
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Remember
interval = 5000 ms
[INFO] 2017-11-26 21:16:06,965
org.apache.spark.streaming.dstream.ForEachDStream logInfo - Initialized and
validated org.apache.spark.streaming.dstream.ForEachDStream@22cc3f40
[INFO] 2017-11-26 21:16:07,273
org.apache.spark.streaming.scheduler.ReceiverTracker logInfo - Receiver 0
started
[INFO] 2017-11-26 21:16:07,341
org.apache.spark.streaming.scheduler.ReceiverTracker logInfo - Receiver 1
started
[INFO] 2017-11-26 21:16:07,402 org.apache.spark.scheduler.DAGScheduler
logInfo - Got job 0 (start at SpamDetector.scala:91) with 1 output
partitions
[INFO] 2017-11-26 21:16:07,402 org.apache.spark.scheduler.DAGScheduler
logInfo - Final stage: ResultStage 0 (start at SpamDetector.scala:91)
[INFO] 2017-11-26 21:16:07,411
org.apache.spark.streaming.util.RecurringTimer logInfo - Started timer for
JobGenerator at time 1511730970000
[INFO] 2017-11-26 21:16:07,411 org.apache.spark.scheduler.DAGScheduler
logInfo - Parents of final stage: List()
[INFO] 2017-11-26 21:16:07,412
org.apache.spark.streaming.scheduler.JobGenerator logInfo - Started
JobGenerator at 1511730970000 ms
[INFO] 2017-11-26 21:16:07,413
org.apache.spark.streaming.scheduler.JobScheduler logInfo - Started
JobScheduler
[INFO] 2017-11-26 21:16:07,419 org.apache.spark.scheduler.DAGScheduler
logInfo - Missing parents: List()
[INFO] 2017-11-26 21:16:07,449
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@32f0c7f8{/streaming,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,450
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@71f96dfb
{/streaming/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,459
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@7275c74b
{/streaming/batch,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,460
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@4315e9af
{/streaming/batch/json,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,461
org.spark_project.jetty.server.handler.ContextHandler doStart - Started
o.s.j.s.ServletContextHandler@479f2dc2
{/static/streaming,null,AVAILABLE,@Spark}
[INFO] 2017-11-26 21:16:07,461 org.apache.spark.streaming.StreamingContext
logInfo - StreamingContext started
[INFO] 2017-11-26 21:16:07,481 org.apache.spark.scheduler.DAGScheduler
logInfo - Submitting ResultStage 0 (Receiver 0 ParallelCollectionRDD[0] at
makeRDD at ReceiverTracker.scala:620), which has no missing parents
[INFO] 2017-11-26 21:16:08,162 org.apache.spark.storage.memory.MemoryStore
logInfo - Block broadcast_0 stored as values in memory (estimated size 52.4
KB, free 366.2 MB)
[INFO] 2017-11-26 21:16:08,289 org.apache.spark.storage.memory.MemoryStore
logInfo - Block broadcast_0_piece0 stored as bytes in memory (estimated
size 17.5 KB, free 366.2 MB)
[INFO] 2017-11-26 21:16:08,302 org.apache.spark.storage.BlockManagerInfo
logInfo - Added broadcast_0_piece0 in memory on 10.0.0.4:44902 (size: 17.5
KB, free: 366.3 MB)
[INFO] 2017-11-26 21:16:08,308 org.apache.spark.SparkContext logInfo -
Created broadcast 0 from broadcast at DAGScheduler.scala:1006
[INFO] 2017-11-26 21:16:08,397 org.apache.spark.scheduler.DAGScheduler
logInfo - Submitting 1 missing tasks from ResultStage 0 (Receiver 0
ParallelCollectionRDD[0] at makeRDD at ReceiverTracker.scala:620) (first 15
tasks are for partitions Vector(0))
[INFO] 2017-11-26 21:16:08,406 org.apache.spark.scheduler.TaskSchedulerImpl
logInfo - Adding task set 0.0 with 1 tasks
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Got job 1 (start at SpamDetector.scala:91) with 1 output
partitions
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Final stage: ResultStage 1 (start at SpamDetector.scala:91)
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Parents of final stage: List()
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Missing parents: List()
[INFO] 2017-11-26 21:16:08,497 org.apache.spark.scheduler.DAGScheduler
logInfo - Submitting ResultStage 1 (Receiver 1 ParallelCollectionRDD[1] at
start at SpamDetector.scala:91), which has no missing parents
[INFO] 2017-11-26 21:16:08,567 org.apache.spark.storage.memory.MemoryStore
logInfo - Block broadcast_1 stored as values in memory (estimated size 52.4
KB, free 366.2 MB)
[INFO] 2017-11-26 21:16:08,585 org.apache.spark.storage.memory.MemoryStore
logInfo - Block broadcast_1_piece0 stored as bytes in memory (estimated
size 17.5 KB, free 366.2 MB)
[INFO] 2017-11-26 21:16:08,596 org.apache.spark.storage.BlockManagerInfo
logInfo - Added broadcast_1_piece0 in memory on 10.0.0.4:44902 (size: 17.5
KB, free: 366.3 MB)
[INFO] 2017-11-26 21:16:08,606 org.apache.spark.SparkContext logInfo -
Created broadcast 1 from broadcast at DAGScheduler.scala:1006
[INFO] 2017-11-26 21:16:08,615 org.apache.spark.scheduler.DAGScheduler
logInfo - Submitting 1 missing tasks from ResultStage 1 (Receiver 1
ParallelCollectionRDD[1] at start at SpamDetector.scala:91) (first 15 tasks
are for partitions Vector(0))
[INFO] 2017-11-26 21:16:08,615 org.apache.spark.scheduler.TaskSchedulerImpl
logInfo - Adding task set 1.0 with 1 tasks
[INFO] 2017-11-26 21:16:08,674 org.apache.spark.scheduler.TaskSetManager
logInfo - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor
driver, partition 0, PROCESS_LOCAL, 5414 bytes)
[INFO] 2017-11-26 21:16:08,733 org.apache.spark.scheduler.TaskSetManager
logInfo - Starting task 0.0 in stage 1.0 (TID 1, localhost, executor
driver, partition 0, PROCESS_LOCAL, 5414 bytes)
[INFO] 2017-11-26 21:16:08,743 org.apache.spark.executor.Executor logInfo -
Running task 0.0 in stage 0.0 (TID 0)
[INFO] 2017-11-26 21:16:08,761 org.apache.spark.executor.Executor logInfo -
Running task 0.0 in stage 1.0 (TID 1)
[INFO] 2017-11-26 21:16:08,792 org.apache.spark.executor.Executor logInfo -
Fetching spark://10.0.0.4:33762/jars/SpamDetector4.jar with timestamp
1511730964086
[INFO] 2017-11-26 21:16:08,989
org.apache.spark.network.client.TransportClientFactory createClient -
Successfully created connection to /10.0.0.4:33762 after 99 ms (0 ms spent
in bootstraps)
[INFO] 2017-11-26 21:16:09,029 org.apache.spark.util.Utils logInfo -
Fetching spark://10.0.0.4:33762/jars/SpamDetector4.jar to
/tmp/spark-73c4a44f-ace6-4dd0-9478-f1e47e80988f/userFiles-8c65ec0e-5b76-4dea-a5b2-291d64d6169a/fetchFileTemp8176107667231819474.tmp
[INFO] 2017-11-26 21:16:09,227 org.apache.spark.executor.Executor logInfo -
Adding
file:/tmp/spark-73c4a44f-ace6-4dd0-9478-f1e47e80988f/userFiles-8c65ec0e-5b76-4dea-a5b2-291d64d6169a/SpamDetector4.jar
to class loader
[INFO] 2017-11-26 21:16:09,591
org.apache.spark.streaming.util.RecurringTimer logInfo - Started timer for
BlockGenerator at time 1511730969600
[INFO] 2017-11-26 21:16:09,591
org.apache.spark.streaming.receiver.BlockGenerator logInfo - Started
BlockGenerator
[INFO] 2017-11-26 21:16:09,592
org.apache.spark.streaming.receiver.BlockGenerator logInfo - Started block
pushing thread