Created on 10-10-2018 10:51 AM - edited 09-16-2022 06:47 AM
hi experts!
there are few storage levels which could be used for Spark persist and cache operations.
(https://umbertogriffo.gitbooks.io/apache-spark-best-practices-and-tuning/content/which_storage_level...)
by default MEMORY_ONLY used.
according my observation, MEMORY_AND_DISK_SER maybe more efficient for more cases for me.
i'd like to change default StorageLevel for this.
is someone have any idea how to do this?
thanks!
Created on 11-12-2018 12:09 PM - edited 11-12-2018 12:09 PM
Hi, late reply but I hope iy can still be useful.
To achieve what you want you should do something like:
dataframe2 = dataframe1.persist(StorageLevel.MEMORY_AND_DISK)
HTH