Support Questions
Find answers, ask questions, and share your expertise

pyspark dataframe moving window concatenation of a String type column


I am using pyspark to process time series data. I want to compute moving window aggregations. I would like to concatenate the values of a string column within each moving time window. I tried the following code, but it gave me the error message that says " AttributeError: 'str' object has no attribute 'over' " Any suggestions how I can I fix this? Thanks in advance!

import sys
from pyspark.sql.window import Window

l = [{'Date' : ['20150101', '20150102', '20150103', '20150104'], \
      'text' : ['text1', 'text2', 'text3', 'text4']}]
a = sqlContext.createDataFrame(l)

n= -2
windowSpec = Window.orderBy(a['Date'].desc()) \
                   .rangeBetween(n, 0)

def myFunc(x):    
    y = ''.join(str(item) for item in x)    
    return y
acrossDays = (myFunc('text').collect()).over(windowSpec))