Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

“val” vs “var” when using mapgroupswithstate in Spark Structured Streaming

“val” vs “var” when using mapgroupswithstate in Spark Structured Streaming


I've seen mapgroupswithstate example in Databricks blog. (

But I wonder what happens if I use "var" instead of "val" in mapgroupswithstate function. For example, databricks shared this example above link;

case class UserState(user:String, var activity:String, var start:java.sql.Timestamp, var end:java.sql.Timestamp)

They defined State fields with "var". And mapgroupswithstate function like this;

def updateAcrossEvents(user:String,
    inputs: Iterator[InputRow],
    oldState: GroupState[UserState]):UserState = {
          var state:UserState = if (oldState.exists) oldState.get else UserState(user,
                                                                                 new java.sql.Timestamp(6284160000000L),                                                                                     new java.sql.Timestamp(6284160L)    )  
          // we simply specify an old date that we can compare against and  
          // immediately update based on the values in our data   
          for (input <- inputs) {    
             state = updateUserStateWithEvent(state, input)    

Also, in this example, state was defined as "var" and updated with each event.

I know "var" vs "val" in Scala. Also, "val" is important for functional programming, and it safe. But I don't know what happens if I use "var" like above example?

For above example, does using "var" create a problem?

Don't have an account?
Coming from Hortonworks? Activate your account here