Reply
Highlighted
Explorer
Posts: 8
Registered: ‎10-04-2016

Spark stream output contains extra characters

Hi

Below is the code of my spark stream however, the output contains extra characters one line before and one after the actual output data. Here is one output. I dont know why i am getting these "::::::::::::::"

out/local-1527763545000/part-00000
::::::::::::::
(u'windows', 1)
(u'mac', 1)
(u'all', 1)
::::::::::::::

 

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
sc = SparkContext("local[2]", "WordCount")
ssc = StreamingContext(sc, 15)
words=ssc.textFileStream("file:///home/aziz/in")
wordss = words.flatMap(lambda line: line.split(" "))
pairs = wordss.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)
wordCounts.saveAsTextFiles("file:///home/aziz/out/local")

ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate
Announcements