Support Questions
Find answers, ask questions, and share your expertise

Questions Around Spark Cache/spillage to the disk

Expert Contributor

Guys,

I have a few questions related to Spark cache and would like to know your inputs on the same.

1) How much cache memory can available to each of the executor nodes? Is there a way to control it?

2) We want to restrict the developers from persisting any data to the disk. Is there any configuration can we change to disable non -memory caching? This is to make sure by mistake, any secure data is not spilled to the disk.

3) If point#2 cannot be achieved, is there a way to make sure that spillage (In case developers use Memory_And_Disk option) happens only to a secure directory and data is encrypted?

4) For streaming data, processing with Spark how secure is it, can encryption be applied to data in flight?

5) If the developers decide to cache steaming RDDs, how secure is it? And same case point#2 above.

Thanks,

SS

1 ACCEPTED SOLUTION

Accepted Solutions

Explorer

1. This can be controlled through configuration, please see http://spark.apache.org/docs/latest/configuration.html#memory-management

2. No, you cannot disable non-memory caching, but you could choose only MEMORY related storage level to avoid spilling to disk when memory is full.

3. No, the data is not encrypted, and there's no way to encrypt spilled data currently.

4. It depends on different streaming sources you choose. For Kafka it supports ssl or sasl encryption.

5. same as #2.

View solution in original post

1 REPLY 1

Explorer

1. This can be controlled through configuration, please see http://spark.apache.org/docs/latest/configuration.html#memory-management

2. No, you cannot disable non-memory caching, but you could choose only MEMORY related storage level to avoid spilling to disk when memory is full.

3. No, the data is not encrypted, and there's no way to encrypt spilled data currently.

4. It depends on different streaming sources you choose. For Kafka it supports ssl or sasl encryption.

5. same as #2.

View solution in original post