Guys,
I have a few questions related to Spark cache and would like to know your inputs on the same.
1) How much cache memory can available to each of the executor nodes? Is there a way to control it?
2) We want to restrict the developers from persisting any data to the disk. Is there any configuration can we change to disable non -memory caching? This is to make sure by mistake, any secure data is not spilled to the disk.
3) If point#2 cannot be achieved, is there a way to make sure that spillage (In case developers use Memory_And_Disk option) happens only to a secure directory and data is encrypted?
4) For streaming data, processing with Spark how secure is it, can encryption be applied to data in flight?
5) If the developers decide to cache steaming RDDs, how secure is it? And same case point#2 above.
Thanks,
SS