Support Questions
Find answers, ask questions, and share your expertise

“val” vs “var” when using mapgroupswithstate in Spark Structured Streaming

“val” vs “var” when using mapgroupswithstate in Spark Structured Streaming

Contributor

I've seen mapgroupswithstate example in Databricks blog. (https://databricks.com/blog/2017/10/17/arbitrary-stateful-processing-in-apache-sparks-structured-str...).

But I wonder what happens if I use "var" instead of "val" in mapgroupswithstate function. For example, databricks shared this example above link;

case class UserState(user:String, var activity:String, var start:java.sql.Timestamp, var end:java.sql.Timestamp)

They defined State fields with "var". And mapgroupswithstate function like this;

def updateAcrossEvents(user:String,
    inputs: Iterator[InputRow],
    oldState: GroupState[UserState]):UserState = {
          var state:UserState = if (oldState.exists) oldState.get else UserState(user,
                                                                                 "",
                                                                                 new java.sql.Timestamp(6284160000000L),                                                                                     new java.sql.Timestamp(6284160L)    )  
          // we simply specify an old date that we can compare against and  
          // immediately update based on the values in our data   
          for (input <- inputs) {    
             state = updateUserStateWithEvent(state, input)    
             oldState.update(state)  
          }
          state
}

Also, in this example, state was defined as "var" and updated with each event.

I know "var" vs "val" in Scala. Also, "val" is important for functional programming, and it safe. But I don't know what happens if I use "var" like above example?

For above example, does using "var" create a problem?