Support Questions

Find answers, ask questions, and share your expertise

Change default StorageLevel for cache() and persist() operations

Rising Star

hi experts!

 

there are few storage levels which could be used for Spark persist and cache operations.

(https://umbertogriffo.gitbooks.io/apache-spark-best-practices-and-tuning/content/which_storage_level...)

 

by default MEMORY_ONLY used.

according my observation, MEMORY_AND_DISK_SER maybe more efficient for more cases for me.

i'd like to change default StorageLevel for this.

 

is someone have any idea how to do this?

 

thanks!

1 REPLY 1

Rising Star

Hi, late reply but I hope iy can still be useful.

 

To achieve what you want you should do something like:

 

dataframe2 = dataframe1.persist(StorageLevel.MEMORY_AND_DISK)

HTH

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.