Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Questions Around Spark Cache/spillage to the disk

avatar
Expert Contributor

Guys,

I have a few questions related to Spark cache and would like to know your inputs on the same.

1) How much cache memory can available to each of the executor nodes? Is there a way to control it?

2) We want to restrict the developers from persisting any data to the disk. Is there any configuration can we change to disable non -memory caching? This is to make sure by mistake, any secure data is not spilled to the disk.

3) If point#2 cannot be achieved, is there a way to make sure that spillage (In case developers use Memory_And_Disk option) happens only to a secure directory and data is encrypted?

4) For streaming data, processing with Spark how secure is it, can encryption be applied to data in flight?

5) If the developers decide to cache steaming RDDs, how secure is it? And same case point#2 above.

Thanks,

SS

1 ACCEPTED SOLUTION

avatar
Contributor

1. This can be controlled through configuration, please see http://spark.apache.org/docs/latest/configuration.html#memory-management

2. No, you cannot disable non-memory caching, but you could choose only MEMORY related storage level to avoid spilling to disk when memory is full.

3. No, the data is not encrypted, and there's no way to encrypt spilled data currently.

4. It depends on different streaming sources you choose. For Kafka it supports ssl or sasl encryption.

5. same as #2.

View solution in original post

1 REPLY 1

avatar
Contributor

1. This can be controlled through configuration, please see http://spark.apache.org/docs/latest/configuration.html#memory-management

2. No, you cannot disable non-memory caching, but you could choose only MEMORY related storage level to avoid spilling to disk when memory is full.

3. No, the data is not encrypted, and there's no way to encrypt spilled data currently.

4. It depends on different streaming sources you choose. For Kafka it supports ssl or sasl encryption.

5. same as #2.