Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Change default StorageLevel for cache() and persist() operations

Change default StorageLevel for cache() and persist() operations

Rising Star

hi experts!

 

there are few storage levels which could be used for Spark persist and cache operations.

(https://umbertogriffo.gitbooks.io/apache-spark-best-practices-and-tuning/content/which_storage_level...)

 

by default MEMORY_ONLY used.

according my observation, MEMORY_AND_DISK_SER maybe more efficient for more cases for me.

i'd like to change default StorageLevel for this.

 

is someone have any idea how to do this?

 

thanks!

1 REPLY 1

Re: Change default StorageLevel for cache() and persist() operations

Rising Star

Hi, late reply but I hope iy can still be useful.

 

To achieve what you want you should do something like:

 

dataframe2 = dataframe1.persist(StorageLevel.MEMORY_AND_DISK)

HTH