Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark stream output contains extra characters

Spark stream output contains extra characters

Explorer

Hi

Below is the code of my spark stream however, the output contains extra characters one line before and one after the actual output data. Here is one output. I dont know why i am getting these "::::::::::::::"

out/local-1527763545000/part-00000
::::::::::::::
(u'windows', 1)
(u'mac', 1)
(u'all', 1)
::::::::::::::

 

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
sc = SparkContext("local[2]", "WordCount")
ssc = StreamingContext(sc, 15)
words=ssc.textFileStream("file:///home/aziz/in")
wordss = words.flatMap(lambda line: line.split(" "))
pairs = wordss.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)
wordCounts.saveAsTextFiles("file:///home/aziz/out/local")

ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate
Don't have an account?
Coming from Hortonworks? Activate your account here