there are few storage levels which could be used for Spark persist and cache operations.
by default MEMORY_ONLY used.
according my observation, MEMORY_AND_DISK_SER maybe more efficient for more cases for me.
i'd like to change default StorageLevel for this.
is someone have any idea how to do this?
Hi, late reply but I hope iy can still be useful.
To achieve what you want you should do something like:
dataframe2 = dataframe1.persist(StorageLevel.MEMORY_AND_DISK)